Learning-Deep-Learning

PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark

July 2022

tl;dr: Joint 2D and 3D lane line detection with transformer.

Overall impression

The paper seems to largely follow the idea of BEVFormer and share many authors, but interestingly did not cite it.

3D lane line vs BEV lane line: The problem formulation of 3D Lane line detection is to predict 3D lane line (and 2D as well) in front of ego from a single onboard camera. It is a bit different from the recent wave of BEV static road structure detection (such as HDMapNet), which targets to predict 360 deg lane line from multiple onboard cameras. –> This is one area I see gaps and would be great to come up with a unified metric to benchmark both tasks.

Two ways to do structured lane line detection: one is top-down approach through anchors (PerFormer), and the other is bottom up approach but with polyline decoders. (VectorHDMapNet). However, how the anchor based method handles intersection is still not clear. The bottom up method seems to be much more flexible method. Yet the bottom-up approach extensively depends on the accuracy of binary segmentation in the first stage, where it would fail in some scenarios such as extreme weather or lighting conditions. Tesla’s method used to be bottom-up (bag of points) and recently got switched to direct to vector space in a recent release (beta 10.11.2).

The paper also proposed a new dataset (OpenLane) on top of Waymo Open dataset. This is quite a significant contribution to the community. Not sure how Waymo dataset organizers would react to it though :) I would love to see this to be officially incorporated into Waymo dataset as one official competition track (if Waymo allows the annotation of the reserved test dataset).

The engineering work seems quite solid, but the writing of this paper needs some improvement.

Key ideas

Technical details

Notes