GLNet: Self-supervised Learning with Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera

July 2020

tl;dr: Combine monodepth with optical flow with geometric and photometric losses.

Overall impression

The paper proposes two online refinement strategies, one finetuning the model and one finetuning the image. –> cf Struct2Depth and Consistent video depth.

It also predicts intrinsics for videos in the wild. –> cf Depth from Videos in the Wild.

The paper has several interesting ideas, but there are some conflicts as well. The main issue is that it uses FlowNet to handle dynamic regions but it still enforces epipolar constraints on the optical flow. Also it does not handle depth of the dynamic regions well.

Geometric constraints are more powerful than photometric constraints.

Key ideas

Technical details