FISHING Net: Future Inference of Semantic Heatmaps In Grids

September 2020

tl;dr: Convert lidar, radar and camera fusion in BEV space.

Overall impression

Perception in autonomous driving involves building a representation that captures the geometry and semantics of the surrounding scenes.

The BEV (top-down) representation across modalities has multiple benefits:

Maybe BEV representation is the ultimate goal for perception. The authors also noted that we need to add the concept of instance. This may be necessary to make the output results to be directly consumable by downstream.

Fishing Net tackles the problem of predicting deterministic future BEV semantic segmentation.

Fishing Net uses BEV grid resolution: 10 cm and 20 cm/pixel. Lift Splat Shoot uses 50 cm/pixel. They are both coarser than the typical 4 cm or 5 cm per pixel resolution used by mapping purposes such as DAGMapper.

Key ideas

Technical details