DF-VO: Visual Odometry Revisited: What Should Be Learnt? [Depth and Flow for VO]

May 2020

tl;dr: Solving relative pose change with optical flow for 2D-2D matching is better than using unreliable depth prediction. Better than ORB-SLAM2 in some metrics.

Overall impression

Solving for relative pose change between frames have two methods: 2D-2D matching and solve for essential matrix; lift 2D to 3D with predicted dense depth and 2D-3D matching and PnP.

Monocular VO suffer from scale-drift issue, thus VIO. This paper builds on SC-sfm-learner, which also uses a geometric loss to ensure depth consistency and thus scale consistency.

DL based methods enable camera tracking in challenging conditions but they are not reliable and accurate in favorable conditions where geometry based algorithms is better (such as sufficient illumination and texture, sufficient overlap between frames).

All learning based methods after SfM-learner don’t explicitly account for the multiview geometry constraints during inference. Hybrid methods achieves SOTA, such as DF-VO and D3VO and KP3D.

Key ideas

Technical details