PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images

July 2022

tl;dr: An improved PETR (with temporal fusion and data dependent PE) for both 3D detection and road layout estimation.

Overall impression

This paper explores the joint BEV perception of dynamic vehicles and static road layout. This is similar to BEVFusion and M2BEV.

BEVFormer defines each point on BEV map as one BEV query. The number of BEV query tends to be huge when the resolution of BEV map is relatively large (256x256 = 65536). PETRv2 defines a smaller number of (e.g., 256) segmentation queries, each of which predicts the semantic map of the corresponding patch.

Key ideas

Technical details