Learning-Deep-Learning

Actions as Moving Points

January 2020

tl;dr: CenterNet for video object detection.

Overall impression

This extends CenterNet as Recurrent SSD extends SSD.

However it is still using box-based method to generate bbox and then link them to action tublets. This is more of a bottom up approach as compared to recurrent ssd.

Drawbacks and limitations: The main drawback is that it takes in K frames (K=7) frames at the same time. It is not suitable for fast online inference. It does support multiple object detection at the same time, same as CenterNet.

Key ideas

Technical details

Notes