ROLO: Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking

January 2020

tl;dr: Summary of the main idea.

Overall impression

There is a series of paper on recurrent single stage method for object detection. The main idea is to add RNN layer directly on top of the entire image feature.

The conversion of bbox to heatmap is also another example of transforming unstructured information to pseudo-image.

K = 6 frames

Key ideas

Technical details