RoI Transformer: Learning RoI Transformer for Oriented Object Detection in Aerial Images

September 2019

tl;dr: Learn parameters for rotated position sensitive pooling to eliminate the need for oriented anchors.

Overall impression

Position sensitive (PS) roi pooling: R-FCN (PS RoI Pooling) –> light head R-CNN (PS RoI Align) –> RoI Transformer (Rotated PS RoI Align)

Same authors from DOTA dataset. Detecting oriented objects is an extension of general horizontal object detection. Three related fields: remote sensing, text scene detection.

The RoI Learner is quite similar to a bbox refinement stage in Faster RCNN, just added an additional dimension of orientation. It replaces the need for many orientated anchors. Alternatively, if oriented anchors are used, then RPN yields RRoI, and RRoIs are used to do oriented PS RoIAlign to regress bbox offsets (both size, position and orientation).

Key ideas

Technical details