MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation

October 2019

tl;dr: BEV localization for pedestrians with uncertainty.

Overall impression

Uses off the shelf human detector and 2D joint detector (Mask RCNN and Pif-Paf). It exploits the relatively fixed height of pedestrians and in particular, shoulder-hip segment (~50 cm) to infer the depth.

The paper also has realistic prediction of uncertainty through aleatoric/epistemic uncertainty. This helps to mitigate those high-risk cases where GT distance is smaller than the predicted one (for which an accident is more likely to happen).

This idea can be readily exploited for mono 3DOD of cars (rigid body with known shape).

This paper is well written and the quality of the open sourced code is amazing! They even have a webcam demo.

The paper is quite similar to the idea of DisNet of using different bbox features to estimate the depth of the object, using a simple MLP.

The paper is further extended by Perceiving Humans by predicting orientation and 2d bbox at the same time, for social distancing.

Key ideas

The main criterion is that the dimension of any object projected into the image plane only depend on the norm of the vector D (x, y, c) and they are not affected by the combination of its components.

Technical details