MonoGRNet 2: Monocular 3D Object Detection via Geometric Reasoning on Keypoints

October 2019

tl;dr: Regress keypoints in 2D images and use 3D CAD model to infer depth.

Overall impression

The training is based on 3D CAD model with minimal keypoint annotation. This is valuable as it saves much annotation effort on 2D images, which is inefficient and inaccurate. It also seems to use the semi-automatic way to annotate 2D keypoints as in deep MANTA.

It is related to deepMANTA that it relies on keypoint regression for monocular 3DOD. The idea of using keypoint to estimate depth can also be found in GS3D. It is not actually that related to MonoGRNet.

It follows the Mono3DOD tradition that regresses local yaw and dimension offset from image patches and infer depth from these results.

Key ideas

Technical details