Learning-Deep-Learning

RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving

January 2020

tl;dr: CenterNet-based method to directly detect 2d projection of cuboid vertices.

Overall impression

This paper uses virtual keypoint and use CenterNet to directly detect the 2d projection of all 8 cuboid vertices + cuboid center. The paper also directly regresses distance, orientation, size. Instead of using these values to form cuboid directly, these values are used as initial value to initialize the offline optimizer to generate 3D bbox.

The predicted keypoints are very noisy. However after optimization it is possible to regress to stable 3d bbox.

The architecture is easy to implement. The post-processing algorithm seems to be quite heavy by solving a multivariate equations can be solved via the Gauss-Newton or Levenberg-Marquardt algorithm in the g2o library. need more investigation. Natively they can be solved by a pseudo-inverse algorithm of an overdetermined linear system. –> this is improved by later work of KM3D-Net.

Key ideas

Technical details

Notes