MultiPath++: Efficient Information Fusion and Trajectory Aggregation for Behavior Prediction

December 2021

tl;dr: Extension to multipath by taking in structured (vectorized) input.

Overall impression

Prediction need to fuse highly heterogeneous world state (static and dynamic) in the form of rich perception signals and map information, and infer highly multi-modal distribution over possible futures.

This paper uses sparse encoding of heterogeneous scene elements and is a VectorNet version of Multipath. It on a high level is similar to Multipath in that the model consists of 1) an encoding step and 2) a predictor head which conditions on anchors and 3) outputs a Gaussian Mixture Model (GMM) distribution for the possible agent position at each future time step.

The paper also proposes multi-context gating (MCG) mechanism which is highly similar to cross attention. It also has a context vector which looks quite similar to what Andrej presented in Tesla AI day in their transformer architecture.

The paper proposed to use separate encoders for each input modality. This is improved by Wayformer.

The paper also has a great overview of past SOTA methods of behavior prediction. A great starting point for behavior prediction.

Key ideas

Technical details