monoDR: Monocular Differentiable Rendering for Self-Supervised 3D Object Detection

September 2020

tl;dr: Use differentiable rendering for monocular 3D object detection, without any 3D labels.

Overall impression

The gist of the paper is how to perform 3D object detection without 3D annotation. The answer is to use differentiable rendering to form self-supervised constraints with 2D annotations.

The scale ambiguity due to projective entanglement of depth and scale are handled by explicitly predicting the metric size of objects. monoDR uses a size loss penalizing the reconstructed loss from deviating too much of the averaged size of the object class. Concurrent work of SDFLabel uses lidar to recover the absolute scale.

It uses an analysis-by-synthesis methodology which is similar to 3D RCNN and RoI10D, and SDFLabel.

The paper also provides an interesting descent-and-explore approach to avoid local minima, most likely by using the so-called hindsight loss.

Key ideas

Technical details