ForeSeE: Task-Aware Monocular Depth Estimation for 3D Object Detection

October 2019

tl;dr: Train a depth estimator focused on the foreground moving object and improve 3DOD based on pseudo-lidar.

Overall impression

This paper succeeds the line of work in pseudo-lidar (pseudo-lidar, pseudo-lidar++, pseudo-lidar e2e).

Two overall issue with pseudo lidar idea: 1) inaccuracies in depth estimation and 2) blurry edges in depth map leading to edge bleeding. Like pseudo-lidar e2e, ForeSeE also realizes the drawbacks of using an off-the-shelf depth estimator, but instead of finetuning it end-to-end, it focuses on the more important foreground moving objects for 3DOD.

The paper has a good introduction and background session.

However the model seems to have much lower performance (even lower than pseudo-lidar). Email sent to authors to inquire about this. –> This turned out to be a game changer for pseudo-lidar:

Note that DORN’s training data overlaps with object detection’s validation data, and suffers from overfitting. Both pseudo lidar and pseudo lidar e2e suffer from this problem. According to the ForeSeE paper, if the validation data is excluded from the training of depth map, then PL’s performance drops from 18.5 to 5.4 AP_3D at IoU=0.7.

Key ideas

Estimation error on a car is much different from the same error on a building.

Technical details