Learning-Deep-Learning

ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst

April 2020

tl;dr: Imitation learning (behavior cloning) for motion planning by synthesizing corner cases.

Overall impression

Typical building blocks of autonomous driving and other robotics system include perception (including sensor fusion), prediction (behavior prediction of other cars), planning (of motion of ego car). The rest of the engineering stack (such as motion control to get steering and braking) is more of mechanical engineering.

Motion planning can be trained with reinforcement learning (RL) or imitation learning (IL) or conventional motion planning. The difference between IL and RL is the IL uses offline data alone and RL is online learning (need to simulate the environment).

ChauffeurNet takes in the results from perception and directly outputs the planned trajectory. Behavior prediction for other agents is downplayed and is just one auxiliary task. This is not exactly end-to-end but is more flexible and scalable and can leverage the power of synthetic data. (cf Gen-LaneNet)

Motion planning’s hardest part is the need for closed-loop tests, which means prediction is feed into the loop for planning the next step. It test the systems’s ability to self-correct and recover from adverse scenario. Open-loop tests means prediction/planning of next step but without feeding into the loop of next state. The paper showed that a model that performs well on open-loop tests do not necessarily perform well on close-loop test.

In imitation learning, or behavioral cloning, when naively applying supervised learning (which assumed iid input) to MDP, there will be a distributional drift, as the action from the last step may affect observations in the next state. The typical way to address the compounding error/distributional drift issue in imitation learning is Data Aggregation (DAgger). (DAgger has humans in the loop and asks expert to label data when observed from following learned policy in closed-loop test. This way you get some supervision on how to correct mistakes.) The perturbation method in ChauffeurNet is essentially addressing issues with synthetic Dagger pipeline.

Key ideas

Technical details

Notes