TrianFlow: Towards Better Generalization: Joint Depth-Pose Learning without PoseNet

May 2020

tl;dr: Use optical flow and dense 2D-2D to solve for local pose and align with depth prediction.

Overall impression

The name seems to come from “triangulate-flow”.

PoseNet lack generalization ability (performs badly for long sequence where relative pose across sequence is hugely different, when video is speed up, and also hardly beats image retrieval baseline).

The idea of using optical flow to calculate relative pose is very similar to DF-VO. The main difference

The knowledge of correspondence (matching) does not have to be learned by PoseNet and thus improves network generalization ability.

Key ideas

Technical details


The central idea of existing self-supervised depth-pose learning methods is to learn two separated networks on the estimation of monocular depth and relative pose by enforcing geometric constraints on image pairs.