TLNet: Triangulation Learning Network: from Monocular to Stereo 3D Object Detection

October 2019

tl;dr: Place 3D anchors inside the frustum subtended by 2D object detection as the mono baseline. The stereo branches reweigh feature maps based on their coherence score.

Overall impression

Pixel level depth maps are too expensive for 3DOD. Object level depth should be good enough. –> this is similar to MonoGRNet.

The paper provides a solid mono baseline. –> this can be perhaps improved by using some huristics such as vehicle size to overcome the dense sampling of 3D anchors.

The paper still requires the 3D bbox GT and stereo instrinsics for the training of the monocular detection network. –> Maybe annotate directly on 2D images?

Key ideas

Technical details