Learning-Deep-Learning

RAR-Net: Reinforced Axial Refinement Network for Monocular 3D Object Detection

August 2020

tl;dr: Use deep reinforcement learning to refine mono3D results.

Overall impression

The proposed RAR-Net is a plug-and-play refinement module and can be used with any mono3D pipeline. This paper comes from the same authors as FQNet. Instead of passively scoring densely generated 3D proposals, RAR-Net uses an DRL agent to actively refine the coarse prediction. Similarly, shift RCNN actively learns to regress the differene.

RAR-Net encodes the 3D results as a 2D rendering with color coding. The idea is very similar to that of FQNet which encodes the 2D projection of 3D bbox as a wireframe and directly rendered on top of the input patch. This is the “direct projection” baseline in RAR-Net. Instead, RAR-Net uses a parameter aware data enhancement. and encodes semantic meaning of the surfaces as well (each surface of the box is painted in a specific color).

The idea of training a DRL agent to do object detection or refinement is not new. It is very similar to the idea of Active Object Localization with Deep Reinforcement Learning ICCV 2015.

Key ideas

Technical details

Notes