Geometric Pretraining for Monocular Depth Estimation

September 2020

tl;dr: Use self-supervised optical flow loss to pretrain of a structure encoder for monocular depth estimator on uncalibrated videos.

Overall impression

The paper provides a new direction to improve monocular depth estimation. Conventional ImageNet pretraining helps more with classification tasks than location-aware tasks such as object detection and depth estimation, as spatial information got discarded.

The core of the algorithm is still photometric loss, and has inherent limitations.

One main difference between optical flow and depth estimation task is that optical flow estimation does not care about calibration.

The idea of encoding motion between two different frames into a latent vector is very similar to PoseNet in SfM-learner and Struct2Depth and more flexible.

The idea of using geometric pretraining to improve monoDepth or mono3D is similar to CubifAE-3D.

Key ideas

Technical details