Learning-Deep-Learning

CRF-Net: A Deep Learning-based Radar and Camera Sensor Fusion Architecture for Object Detection

January 2020

tl;dr: Paint radar as a vertical line and fuse it with radar.

Overall impression

This is one of the first few papers that investigate radar/camera fusion on nuscenes dataset. It is inspired by Distant object detection with camera and radar.

Both CRF-Net and Distant radar object transforms the unstructured radar pins to pseudo-image and then process it with camera. Alternative approach to unstructured radar pins (point cloud) is to use PointNet, but PointNet is usually the best in classification or semantic segmentation when the RoI is extracted. Pseudo-image method is used in many other works, PointPillars for lidar data, and many work that incorporate intrinsics.

Embedding meta data info into conv:

meta data fusion for TL2LA
fusing radar pins with camera
cam conv to fuse focal length into convs.
camera radar fusion net

Key ideas

The architecture is RetinaNet with VGG, with radar fed in from multiple levels.
Paint radar point as vertical line. Line starts from ground and extends 3 meters, and are thus not uniformly painted vertically. cf Parse Geometry from a Line.
Accumulate radar in the past 13 frames (~ 1s) for more data
Radar information include RCS and distance.

Technical details

Training using BlackIn, essentially input dropout. Similar technique is used in Qualcomm’s radar camera early fusion as well to increase the robustness of the network.
It removes radar pins outside of 3D GT bbox. This removes a lot of noise in radar results.

Notes

Background

In heavy rain or fog, the visibility is reduced, and safe driving might not be guaranteed. In addition, camera sensors get increasingly affected by noise in sparsely lit conditions. The camera can also be rendered unusable if water droplets stick to the camera lens.

Filtering out stationary radar object is common, but this may filter out cars under traffic light or bridges.