Translating Images into Maps

December 2021

tl;dr: Axial transformers to lift images to BEV.

Overall impression

The paper assumes a 1-1 correspondence between a vertical scanline in the image, and rays passing through the camera location in an overhead map. This relationship holds true regardless of the depth of the pixels to be lifted to 3D.

This paper is written with unnecessarily cumbersome mathematical notation, and many concepts can be explained in plain language with transformers terminology.

Key ideas

Technical details