Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud

August 2019

tl;dr: End-to-end pseudo-lidar training with 2D/3D bbox consistency loss.

Overall impression

This paper’s main idea largely overlaps with that of pseudo ldiar. The main problem with pseudo-lidar is the noise (i.e., depth inaccuracies, long tails) in the reprojected 3d point cloud due to blurry boundaries. Pseudo lidar ++ proposes to use sparse depth measurement to alleviate this problem, while this study uses 2D and 3D bounding box consistency (similar to deep3DBox).

However there is a major problem with the current approach. The idea of trying to predict a correct 3d bbox from a noisy point cloud is not optimal and the 3d box prediction get even “contaminated” from the 2d-3d bbox consistency. A better way is to finetune the point cloud generation process as well. This requires propagating the depth gradient to the depth net. –> see depth coeff for a solution!

Pseudo-lidar++ tackles this fundamental problem and achieves better performance, but it requires supervision from sparse depth measurements.

Note that DORN’s training data overlaps with object detection’s validation data, and suffers from overfitting. Both pseudo lidar and pseudo lidar e2e suffer from this problem. According to the ForeSeE paper, if the validation data is excluded from the training of depth map, then PL’s performance drops from 18.5 to 5.4 AP_3D at IoU=0.7.

Key ideas

Technical details