3D-RCNN: Instance-level 3D Object Reconstruction via Render-and-Compare

October 2019

tl;dr: Mono 3DOD by estimating pose and shape of vehicles and render-and-compare loss.

Overall impression

3DOD is critical for prediction and path planning. However 3D ground truth is hard to obtain. 3D RCNN only needs 2D annotation (depth and semantic segmeantion). It also need accurate intrinsics/extrinsics to make it work.

This video seems to stem from the concept of this video of PASCAL 3D

First learn the low-dimensional space from CAD models for each subtype. PCA is used. But AutoEncoder seems also OK, such as RoI10D which are heavily inspired by this work and seems more practical.

Analysis by synthesis: Estimate the shape, pose and size parameters of the cars, and render (synthesize) the scene. Then the mask and depth map are compared with ground truth to generate loss.

The shape and pose are weakly supervised and arise from end-to-end training.

Key ideas

Technical details