Learning-Deep-Learning

ContFuse: Deep Continuous Fusion for Multi-Sensor 3D Object Detection

June 2019

tl;dr: Uses parametric cont conv to fuse camera features with point cloud features. It projects camera into BEV space.

Overall impression

ContFuse finds the corresponding point in camera images for each of the point cloud (and their KNN), then concats the interpolated camera features to each point. This leads to better 3D object detection results.

However the ablation tested results show that only one nearest neighbor is needed to achieve best performance, which is surprising and make me doubt the effectiveness of the proposed method (The idea is good but the engineering details may be improved).

Improved by MMF (CVPR 2019) from the same group (Uber ATG), which uses multi-task to boost the performance even further.

This method project camera points into BEV. In this sense it is related to, and perhaps inspired pseudo-lidar and pseudo-lidar++.

Key ideas

Technical details

Notes