Kinematic 3D Object Detection in Monocular Video

July 2020

tl;dr: Mono3D with EKF to form temporally consistent tracks.

Overall impression

The paper is one of the first study in leveraging monocular video for 3D object detection (video-based 3d object detection). The study proposes several improvements over baseline M3D-RPN. It is possible to predict the ego motion and object motion respectively.

The performance boost based on kinematics is not huge, but it makes the tracks temporally coherent.

The EKF is a postprocessing module after the mono3D object detector.

KITTI datasets seem to provide 4 temporally adjacent frames for each annotated frame. Kinematic mono3D uses 4 frames for inference.

Key ideas

Technical details