FixRes: Fixing the train-test resolution discrepancy

February 2020

tl;dr: Conventional imageNet classification has a train/test resolution discrepancy (domain shift).

Overall impression

Scale invariance/equivariance is not guaranteed in CNN (only shift invariance). The same model with different test time input will yield very different statistics. The distribution of activation changes at test time, the values are not in the range that the final cls layers were trained for.

In ImageNet training, conventional way is to use 10-time crop (center, four corners, and their mirrors) and test time is always central crop. This leads to a discrepancy of the statistics in training/test.

Simple solution: finetune last layer with test time scale and resolution, as the final stage of training.

Key ideas

Technical details