Detect to Track and Track to Detect

January 2020

tl;dr: Predict shift of bbox across frames for better tracking.

Overall impression

Conventionally, each frame is pushed through an object detector to get a list of bbox, independently. Then the list of bbox goes through a data association (Hungarian algorithm, e.g.) to form tracklets. The Hungarian matching usually takes the IoU as the matching criterion over frames.

This paper mainly addresses issues when there are large shifts from videos (global shifts) and when IoU across frames are not reliable anymore. It predicts the movement of each bbox from one frame to the next, from the correlation maps between frames.

Using correlation for tracking stems from traditional CV. Correlation tracker.

The tracking method here is offline. According to 知乎:

关于offline的数据关联方法有很多,例如Max Flow Mini Cut、k-partite graph、multicut、crf、mrf等,主要是涉及组合、图论或者是概率图模型中的一些方法,而online算法目前以个人知识面只知道匈牙利算法Hungarian Algorithm(论文deepsort)和把multi-tracking当作强化学习RL的马尔科夫决策过程MDP过程(论文MDP_tracking)。

Key ideas

Technical details