IntentNet: Learning to Predict Intention from Raw Sensor Data

April 2020

tl;dr: A boosted version of Fast and Furious that uses map information.

Overall impression

IntentNet is heavily inspired by Fast and Furious (also by Uber ATG). Both combines perception, tracking and prediction by generating bbox with waypoints. In comparison, IntentNet extends the horizon from 1s to 3s, predicts discrete high level behaviors, and uses map information. Note that both IntentNet and Fast and Furious do not perform motion planning.

The concatenation of images may introduce more compuation than LSTM, but itself does not preclude real-time performance. Fast and Furious also uses concatenation of input and achieves real-time performance. It is further extended to neural motion planner.

IntentNet somewhat inspired MultiPath but IntentNet’s loss only predict one path instead of one path per intent, making IntentNet unsuitable for multimodal prediction.

IntentNet uses intention estimation to help trajectory prediction, similar to pedestrian intention prediction.

Key ideas

Technical details