Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net

July 2019

tl;dr: A single network to do detection, tracking and prediction.

Overall impression

The oral presentation is quite impressive. Modern approaches to autonomous driving has four steps: detection, tracking, motion forecasting and planning.

The assumption of the paper is that tracking and prediction can help object detection, reducing both false positives and false negatives.

More robust to occlusion and sparse data at range. It also runs real-time at 33 FPS.

IntentNet is heavily inspired by Fast and Furious (also by Uber ATG). Both combines perception, tracking and prediction by generating bbox with waypoints. In comparison, IntentNet extends the horizon from 1s to 3s, predicts discrete high level behaviors, and uses map information.

Tracking is done as a postprocessing in FaF. Tracking is then incorporated in the loop of PnP in PnPNet.

Key ideas

Technical details