Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks

April 2020

tl;dr: Multimodal behavioral prediction from Uber ATG with 6 seconds horizon.

Overall impression

Very similar to the idea of Waymo’s MultiPath. Uber’s approach uses multiple trajectory prediction (MTP) loss. Waymo’s approach uses fixed number of anchor trajectories. These two approaches are largely equivalent–predicting the mode first, and masking out the loss for all other modes.

It uses a raster image to encode map information (BEV semantic map), very close to MultiPath and the previous researches such as RoR, ChauffeurNet and IntentNet.

It is quite interesting to see that a single modal model will just predicting the average of the two modes. In general, if it is hard for humans to label deterministically, the underlying distribution is multimodal.

Key ideas

Technical details