Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video

September 2019

tl;dr: First paper that demonstrate scale consistency in long video and can achieve better performance than stereo.

The next step paper is DF-VO which predicts dense optical flow and uses 2D-2D matching to regress ego-motion, achieving even more accurate VO.

Overall impression

The introduction of depth scale consistency is the key to the good performance on relative pose estimation, and thus enables the VO use.

The performance of sfm-learner is actually not that good on VO. Scale and rotation drift is large. See scale consistent sfm-learner for better VO performance.

Key ideas

Technical details