BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection

June 2022

tl;dr: Lift BEVDet to 4D with limited temporal information.

Overall impression

The paper builds on top of BEVDet and introduces temporal component. This boosts the accuracy of velocity estimation and thus NDS.

The spatiotemporal alignment module is very similar to that from BEVFormer, yet with a much simpler fusion module. The temporal fusion is not through convLSTM or similar recurrent structure but simply concats with previous frame T-1. The time window is not adjustable. This could limit the performance of BEVDet.

The engineering work is still excellent, but the writing unfortunately lacks clarity and needs some guess work. The math equations in this paper are really unnecessary.

This work can be compared with the concurrent BEVerse. In comparison, BEVDet4D has slightly better performance in BEV detection.

Key ideas

Technical details