MILE: Model-Based Imitation Learning for Urban Driving

July 2023

tl;dr: End-to-end methods that jointly learns a model of the world and a policy for AD.

Overall impression

MILE looks quite similar to Gato in the sense that they take in the observation and directly predict action. Both are based on immitation learning, with dataset generated from an RL-expert in an virtual environment. MILE does one extra step of prediction the evoluton of the environment, instead of getting it fromt he simulation engine like Gato (this is the future direction of Gato v2, as described in the Gato paper).

MILE uses 3D geometry as an inductive bias and learns a highly compact latent space directly from raw videos. The evolution of the environment dynamics are reasoned and predicted in this highly compact latent space. This learned latent state is the input to driving policy (output control signal) and can be decoded to BEV segmentation for visualziation and supervision.

Trajectory forecasting explicitly estimates the future traj of dynamic agents given past trajectory and scene context (HD Map, etc). World models build a latent representation that explains the observagtions from the sensory input of ego and its action. The future trajectory is implicitly encoded in the latent variable.


Key ideas

Technical details


TODO: to ask the author


To read