Learning-Deep-Learning

Parse Geometry from a Line: Monocular Depth Estimation with Partial Laser Observation

December 2019

tl;dr: Predict dense depth from one-line lidar guided by RGB image.

Overall impression

This paper proposes to “sculpt” the entire depth image from a reference while the original depth prediction task is to creating a depth value from the unknown. This makes the problem more tractable.

Monocular depth estimation is an ill-posed problem. Using sparse depth information can be helpful in solving the scale ambiguity. Refer to Deep Depth Completion of a Single RGB-D Imag and deep lidar and sparse to dense for depth completion from unstructured sparse data.

Similar idea has been used in Camera radar fusion Net.

Key ideas

For each point in the imputed laser scan, generate a line along the gravity direction in 3D, then projecting back to 2D. –> generating a vertical line directly should largely yield the same results.
Add the reference depth map to the network output to predict depth. This means the network only has to learn the residual depth.

Technical details

Interpolation is used to fill in the blanks in the horizontal direction before populating in the vertical one. This is potentially dangerous as it introduces spurious data point in mid-air.
Mixed classification and regression loss
- multibin cls: the predicted value is with weighted average of all bins.
- Softmax loss: when prediction falls into the correct bin, cls loss vanishes. This can be extended to cross entropy loss used in DC. \(L_c = \sum_{i=1}^{M}\sum_{k=1}^{K} \delta([y_i] - k_i) \log(p^k_i) = \sum_{i=1}^{M} \log p^{[y_i]}\)
- regression with L1 loss.
- for improved regression, see SMWA or DC

Notes

This idea of using complementary sensor information can be extended to depth prediction using radar and rgb image.