D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry

May 2020

tl;dr: Add depth estimation and brightness transformation to Monodepth2. The paper incorporate the deep predicted pose into regularization into a backend.

Overall impression

Monocular VO suffers from scale drift and low robustness. The pose from PoseNet in sfm learner and Monodepth2 are robust, but they are not as accurate as geometry based methods. This paper explores on how to combine geometric approach to deep learning approach (aka “hybrid”).

Hybrid methods combines deep learning with geometry based methods:

As repeated demonstrated as with other hybrid methods, D3VO beats all other end to end methods by a large margin.

VO lacks robustness for low texture area and fast movement. VIO is more robust, but IMUs cannot deliver the metric scale in constant velocity.

Both KP3D and D3VO uses DSO as backned, and KP3D reaches on par performance with DVSO, while D3VO beats DVSO and even achieves comparable to stereo/lidar methods on KITTI odometry.

Key ideas

Technical details