Deep3dBox: 3D Bounding Box Estimation Using Deep Learning and Geometry

July 2019

tl;dr: Monocular 3d object detection (3dod) by using 2d bbox and geometry constraints.

Overall impression

This paper proposed the famous discrete-continous loss (or multi-bin loss, or hybrid classification/regression loss) that has become standard for regress large range of target or multi-modal regression problem. In retrospect, it is the same as using anchors such as those in object detection.

This is not end-to-end. NN is used to estimate 2D bbox and dimensions and orientations of the bbox. Then the distance (translational vector) is obtained by solving for linear equation posed by the constraint of the corners touching four sides of 2D bbox.

A simpler version for 3d proposal generation based on 2d bbox and viewpoint classification is in semantic 3d slam.

Key ideas

Technical details