STSU: Structured Bird’s-Eye-View Traffic Scene Understanding from Onboard Images

October 2021

tl;dr: DETR-like structure for structured BEV perception of lane line and objects.

Overall impression

The paper focuses on the structured representation of the road networks and instance-wise identification of the traffic agents. This is a follow-up work to BEV feature sticthing.

This paper follows the DETR-style end to end object detection (extended to structured lane detection), which uses sparse queries in BEV space. This is actually one direction of Tesla’s future work as they mentioned in AI Day. This idea is also used in DETR3D, and the results of the dynamic object does not look as good as in DETR3D.

Previous work focuses on semantic segmentation, but this paper not only focuses on instance detection, but in BEV.

The output results actually does not look super impressive, but this provides a brand new direction for BEV perception.

The idea was significantly improved by VectorMapNet. This idea was developed further by the authors in CVPR 2022 in TPLR: Topology Preserving Local Road Network Estimation from Single Onboard Camera Image CVPR 2022 [STSU, Luc Van Gool].

Key ideas

Technical details