CaDDN: Categorical Depth Distribution Network for Monocular 3D Object Detection

March 2021

tl;dr: Lift perspective image to BEV for mono3D, with direct depth supervison. It can be seen as an extension to Pseudo-lidar.

Overall impression

CaDDN focuses on accurate prediction of depth to improve mono3D performance. The depth prediction method is based on improved version of ordinal regression.

The idea of projecting perspective image to BEV representation and then perform object detection is quite similar to that of OFT and Lift Splat Shoot. These implicit method transforms feature map to BEV space and suffers from feature smearing. CaDDN leverages probabilistic depth estimation via categorical depth distribution.

Previous depth prediction is separated from 3D detection during training, preventing depth map estimates from being optimized for detection task. In other words, not all pixels are created equal. Accurate depth should be prioritized for pixels belonging to objects of interest, and is less important of background pixels. CaDDN focuses on generating an interpretable depth representation for mono3D. –> cf ForeSeE

Key ideas

Technical details