Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation

March 2020

tl;dr: Decouple 2D projection estimation with depth estimation.

Overall impression

The paper is the first one to state clearly that the idea that the depth dimension is decoupled from predicting the 2D projection of 3D cuboid. Essentially it says we can estimate the 3d bbox position at normalized iamge plane, then estimate the depth.

This is the correct way to formulate the question and I am surprised that no previous work has formulated this question this way. MonoDIS tried to disentangle the losses but it is a more general framework from the training’s perspective.

Key ideas

Technical details