DSNT: Numerical Coordinate Regression with Convolutional Neural Networks

March 2021

tl;dr: A differentiable way to transform spatial heatmaps to numerical coordinates.

Overall impression

The SOTA method for coordinate prediction is still heatmap-based keypoint regression (heatmap matching) instead of direct numerical coordinate regression.

Previous method obtains numerical coordinates by applying armgax to the heatmaps, which is a non-differentiable operation. Gradient flow starts from heatmap and not the numerical coordinates. The handcrafted features in designing heatmap and the postprocess to obtain numerical coordinates leads to sub-optimal system design.

DSNT proposed a way to back-propagate all the way from the predicted numerical coordinates tot he input image.

DSNT can only handle one keypoint per heatmap. How to extend the work to multiple keypoints per heatmap is still open to research. –> Maybe try to impose a neighborhood.

Key ideas

Technical details