SuperPoint: Self-Supervised Interest Point Detection and Description

January 2020

tl;dr: Learn a real-time feature detector based on self generated data. “Super”point as it creates a superset of an early work “magic point”.

Overall impression

A local feature consists of an interest point (key point, salient point) and a descriptor. Multiview geometry studies how to recover the transformation between the views and infer the 3D positions of these key points, based on the assumption that the points are matched across multiview images. How do we learn a feature detector?

Now from the stand point of generative modeling, if we know the key points of one image, we can do homographic transformation of the image together with the key points. This will generate tons of training data to learn descriptor. (Yet another example of transferring knowledge of a well-defined math problem to neural nets)

How to learn detector in the first place? We can render 2D projections with 3D objects with known interest points. From synthetic to real images, in order to bridge the sim2real gap, test time augmentation is used to accumulate interest point features. This TTA is called “homographic adaptation”.

The above three steps largely summarizes the main idea of this paper:

The design of catch-all channel dustbin to recalibrate softmax heatmap is interesting, and both SuperPoint and VPGNet used the same trick.

From the point that the keypoint detection and representation are shared across two tasks, SuperPoint is similar to associative embedding.

This paper also inspired unsuperpoint which does not require pseudo-GT to train.

Key ideas

Technical details