Tracktor: Tracking without bells and whistles

July 2020

tl;dr: Use detector for predict object bbox and tracklet. A detector is all you need.

Overall impression

It should be compared with Detect to track and track to detect. The offset regression method proposed in Tracktor directly inspired CenterTrack.

It beats Detect to track and track to detect by almost 10 MOTA on MOT17 challenge.

However it is still based on 2-stage method (Faster RCNN) on region proposal and bbox refinement (regression). In comparison, CenterTrack moved this to single stage domain, and with anchor-free as a bonus point.

Video object detection is essentially multi-object tracking without frame to frame identity prediction.

The paper is not easy to understand.

Key ideas

Technical details