MAE: Masked Autoencoders Are Scalable Vision Learners

November 2021

tl;dr: Scalable unsupervised pretraining of vision model by masked image modeling.

Overall impression

This paper is very enlightening.

This paper rushed the publication of other contemporary work such as SimMIM and iBOT. The clarity of the message, the depth of insight, the craft of engineering consideration, the coverage of ablation study of MAE is significantly superior to the others.

Key ideas

Technical details