BC-SAC: Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios

June 2023

tl;dr: IL + RL » IL. IL lays the foundation and RL solidifies it.

Overall impression

The paper combines IL and RL to get the best of both worlds.

This paper optimizes the action, not trajectory. Steer x acceleration = 31 x 7 = 217 actions.

Previous way to improve safety is to augment a learned planner with a non-learned fallback layer that guarantees safety.

SAC-BC still requires heuristically choosing the tradeoff between IL and RL objectives.

Key ideas

Technical details