May 2020
tl;dr: Use optical flow and dense 2D2D to solve for local pose and align with depth prediction.
Overall impression
The name seems to come from “triangulateflow”.
PoseNet lack generalization ability (performs badly for long sequence where relative pose across sequence is hugely different, when video is speed up, and also hardly beats image retrieval baseline).
The idea of using optical flow to calculate relative pose is very similar to DFVO. The main difference
 DFVO has pretrained depth and flow network separately with PoseNetlike architecture while TrianFLow got rid of PoseNet altogether and uses the triangulation to perform selfsupervision.
 DFVO is based on SCsfmlearner to ensure consistency and aligns pose to depth. TrianFlow aligns depth to pose. (Why?)
The knowledge of correspondence (matching) does not have to be learned by PoseNet and thus improves network generalization ability.
Key ideas
 FlowNet is based on PWCNet
 Scale is explicitly disentangled at both training and inference.
 Training:
 optical flow to get dense matching
 forwardbackward consistency to generate score map Ms
 Sample points that survives occlusion mask Mo and top 20% forwardbackward score.

8 pt algorithm in RANSAC + cheirality check to solve F matrix and R 
t. 

Based on R 
t and correspondence, get triangulated point depth with midpoint triangulation to get uptoscale 3d structure. Points around epipoles (vanishing points) are removed for triangulation. 
 Dense predicted depth is aligned to sparse triangulated depth. The 3d structure’s scale is determined by relative pose scale. The triangulated depth is used as pseudodepth signal to supervise depth prediction
 Inference (same as DFVO)
 Calculate fundamental matrix from optical flow
 When optical flow is too small, use PnP to solve for relative pose.
 TrianFlow can generalize to unseen ego motion.
 For 3x fast sequence, ORBSLAM2 frequently fails and reinitializes under fast motion
 The results is better than most other endtoend methods, but not a good as DFVO.
Technical details
 Occlusion map, Mo
 Flow consistency score map, Ms
 The recovered pose from optical flow is obtained using cv2.recoverPose and has unit length t.
 inlier score map, Mr, by computing distance map from each pixel to its corresponding epipolar line. Implementation of inlier mask in code
 Angle mask: filter out points close to epipoles. Implementation of angle mask
 During training, the sparse triangulated depth are up to scale, and the depth difference is normalized by the sparse depth value again, and thus the depth loss is scale invariant.
Notes
 Q: the scale normalization is there to ensure a consistent scale between depth and flow, but what ensures a scale consistency across frames? –> This seems to be learned implicitly by the depth network. Now the depth network only has to focus on learning the relative depth, and the scale consistency seems to be come from the continuity assumption of the network, that a continuous change in image leads to continuous change in depth prediction. But adding the scale consistency loss proposed in SCSfMlearner does not seem to hurt?
 The paper
 During inference, the code actually assumes depth predictions have consistent scale and thus aligns pose to depth.
The central idea of existing selfsupervised depthpose learning methods is to learn two separated networks on the estimation of monocular depth and relative pose by enforcing geometric constraints on image pairs.