CC: Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation

August 2020

tl;dr: Train a motion segmentation network to moderate the data feed into depth predictor and optical flow estimator.

Overall impression

The paper is along the lines of GeoNet and GLNet by combining depth prediction and optical flow prediction. However the paper is articulated that the two are connected by a third task: motion segmentation.

However CC uses purely low level geometric constraints for self-supervised depth prediction, and cannot solve infinite depth issue by design. The idea is taken further by SGDepth, which incorporates semantic information into the task of motion segmentation.

The overall training strategy is very complex, into 6-steps. The model’s performance has been surpasssed by new techniques such as SGDepth.

In GeoNet, optical flow network is used to predict “residual” flow and thus no coupling between depth and optical flow. This cascaded design prevents exploitation of inter-task dependencies. DFNet exploit consistency between depth and flow, but did not account for moving objects.

Not all data in the unlabeled training set will conform to the SfMLeaner’s assumption, and some of it may corrupt training. Thus one key question is to how to exclude such data, such as independently moving area.

Key ideas

Technical details