SfMLearner: Unsupervised Learning of Depth and Ego-Motion from Video

June 2019

tl;dr: Unsupervised learning framework to learn monocular depth and camera motion (6-DoF transformation) simultaneously. Use view synthesis and consistency as the supervision (similar to stereo depth estimation).

Overall impression

One way to do unsupervised learning is through stereo pairs, and the other way to do it is from monocular video frames. This paper ensures consistency with very little assumption (intrinsic matrix is assumed).

The idea is similar to the cycle consistency of cycleGAN as well.

The performance of sfm-learner is actually not that good on VO. Scale and rotation drift is large. See scale consistent sfm-learner for better VO performance.

Key ideas

Technical details