NEAT: Neural Attention Fields for End-to-End Autonomous Driving

October 2021

tl;dr: transformers to learn an interpretable BEV representation for end-to-end autonomous driving.

Overall impression

The goal of the paper is interpretable, high-performance, end-to-end autonomous driving. Yet the way to generate the interpretable intermediate representation is quite interesting.

Both NEAT and PYVA also uses the idea of transformers in the task of lifting image to BEV, however upon closer look, the transformers are not used specifically for view transformation.

The paper has many esoteric details regarding planning and control, and I am not sure if I fully understands those parts, and the discussion of those parts are ignored here.

Key ideas

Technical details