KP3D: Self-Supervised 3D Keypoint Learning for Ego-motion Estimation

March 2020

tl;dr: Predict keypoints and depth from videos simultaneously and in a unsupervised fashion.

Overall impression

This paper is based on two streams of unsupervised research based on video. The first is depth estimation starting from sfm Learner, depth in the wild and scale-consistent sfm Learner, and the second is the self-supervised keypoint learning starting from superpoint, unsuperpoint and unsuperpoint with outlier rejection.

The two major enablers of this research is scale-consistent sfm Learner and unsuperpoint.

The main idea seems to be using sparse matched keypoint pairs to perform more accurate (relative) pose estimation. Previously the ego motion is directly regressed from two stacked neighboring images. This leads to much better ego motion estimation.

Both KP3D and D3VO uses DSO as backned, and KP3D reaches on par performance with DVSO while D3VO beats DVSO.

Key ideas

Technical details