VectorMapNet: End-to-end Vectorized HD Map Learning

July 2022

tl;dr: BEV perception of road layout with directly vectorized output.

Overall impression

Map elements nee a compact representation (vectorized representation) to ensure that they can be used for downstream tasks like prediction and planning.

This paper proposes the top-down approach of road layout prediction. This is quite different from most previous bottom-up approaches (such as HDMapNet from the same group). Similar top-down approaches include STSU, but the performance of VectorMapNet is significantly better and reaches new SOTA on BEV perception of road layout.

Overall the paper divides the detection of geometrically accurate map element into two steps, first detecting the map element in the BEV space and then predicting the local geometric details inside each element. Actually this gives strong incentives for future researchers to directly predict the lane line segments (map elements as referred to in this paper) in a centerNet-like one-stage detector (detecting the instance center and then offsets).

The paper is full of great insights. It casts new possibilities into the field of BEV perception of road layout. However the writing of the paper is a bit hard to follow, especially the two-stage detection method. The math notation is quite sloppy, especially in Session 2.2. Lots of reused letters which have different meanings in fact. The ablation study could have been more thorough. For example, the design choices in Figure 3 was not ablated (see Notes for details).

Key ideas

Technical details