MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird’s Eye View Maps

December 2020

tl;dr: Predict semantic class and motion of BEV occupancy grid, serving as a backup for 3D object detection.

Overall impression

Data-driven lidar 3D object detection cannot handle unseen corner cases, and occupancy grid map (OGM) does not have semantic information and does not have correspondence of cells across time and thus hard to reason object-level dynamics. (Monocular BEV semantic segmentation also has this drawbacks).

MotionNet proposes to use a BEV representation that extends OGM by occupancy, motion, and category information. Motion info is encoded by associating each cell with displacement vectors, as in CenterPoint

Key ideas

Technical details