March 2024
tl;dr: A fully data-driven way to SOLVE autonomous driving.
Overall impression
Primary hurdles to overcome for fully autonomous driving are:
1) Technical scalability
2) safety critical engineering efforts
3) unit economics (BOM cost and util price)
4) regulation.
Out of the four, technical scalability is the key, or the ability of the decision-making SW to generalize to new situation quickly with sufficient perf for deployment.
Previous efforts (AV 1.0) focus on solving specialized general intelligence by combining components of even narrower intelligence.
Key ideas
- AV1.0: sense-plan-act
- Sensing:
- not a limiting factor, especially for urban driving, the crown jewel of AD.
- Scene representation
- Localization and mapping: simplifies online decision making by shifting some info offline to a map with clean data. Bad: maintainance cost, and nothing is truly stationary.
- Perception: limitation is the hand-crafted representation itself. Do we have the necessary info for the decision?
- Behavior prediction: sensitive to upstream errors, and ha a dependency with planning. This means an isolated prediction system will ALWAYS have some representation error.
- Planning
- Behavior planning: hard to separate from behavior prediction, and the highly engineered expert system is known for being brittle. –> Upgrading expert system of Go gameplaying to AlphaGo.
- Motion planning: works OK with limited challenges.
- Control: works OK with limited challenges.
- From AV1.0 to AV2.0
- Bottleneck of AV1.0 is in prediction and planning. Solving behavior prediction and planning as defined by these boundaries will enable self-driving.
- Holistic learned driver. The driving policy can be thought of learning to estimate the motion the vehicle should conduct given some conditioning goals.
- Difficult to apply increasing amount of learning to AV1.0 where the handcrafted interface limit the effectiveness of data.
- AV2.0 architecture: joint sensing and planning, end-to-end.
- Framing the problem as one that can be solved by data.
- Curate a data source, at sufficient scale and diversity.
- Build a data engine. Train and iterate effectively.
- Build testing infra to validate the system, virtually and in reality.
- Iterate. Through data and algo.
- HMI and interpretability
- The representation can be decoded for human interpretation. –> This may be the future of perception, merely as a friendly and reassuring HMI.
- Algo of AV2.0
- Learn a multimodal foundation model or world model
- Model prediction and planning jointly, and which is also conditioned on action.
- Finetune a model-based policy
- Need effective off-policy learning and eval (?) to compare driving decisions.
Technical details
- Summary of technical details, such as important training details, or bugs of previous benchmarks.
Notes
- Questions and notes on how to improve/revise the current work