CubifAE-3D: Monocular Camera Space Cubification on Autonomous Vehicles for Auto-Encoder based 3D Object Detection

October 2020

tl;dr: Use depth pretraining with AE on synthetic data to help Mono3D.

Overall impression

The idea of using depth pretraining for mono3D is similar to Geometric pretraining for monoDepth. The pretraining can be done with synthetic dataset. Maybe the self-supervised pretraining can also work.

The idea of cubifying 3D space is densely sample 3D space, and is similar to the idea of 3D anchors in M3D-RPN. The idea of pool 2D image features into 3D voxels resembles that of OFT.

The most contribution to this work seems to be the improvement of 3D detection for far away objects. It in a way eliminated the depth dependency of prediction errors. –> How is this done?

The GT assignment looks interesting as it predicts up to 10 cars in each cuboid and they are sorted by increasing depth. Anchor is a way to implicitly sorting the prediction and GT. DETR is quite radical in eliminating the ordering and sorting of GT and prediction altogether and replace with a Hungarian matching loss.

Key ideas

Technical details