A Multigrid Method for Efficiently Training Video Models

February 2020

tl;dr: An efficient training technique by scaling spatial and temporal dimension of videos.

Overall impression

The paper is from FAIR and well written, as usual. Lots of experiments, and lots of GPUs (128)! Although they also validated the methods on 1 GPU as well with 3x speed up.

Recent Video training SOTA: I3D, SlowFast, Non-Local

It draws inspiration from FixRes that it requires a finetuning stage at the end to match train/test descrepancy.

Key ideas

Technical details