Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

Feb 2019

tl;dr: Inflated 3D CNN (I3D) with weight bootstrapped from 2D imagenet pretrained weights.

Overall impression

One important drawback of 3D CNN is the lack of good initialization strategy and the lack of large datasets to pretrain its weight. This paper demonstrated that 2D weights pretrained on imagenet can be a good initialization strategy for 3D CNN as well.

Key ideas

Technical details