CBN: Cross-Iteration Batch Normalization

May 2020

tl;dr: Improve batch normalization when minibatch size is small.

Overall impression

Similar to GroupNorm in improving performance when batch size is small. It accumulates stats over mini-batches. However, as weights are changing in each iteration, the statistics collected under those weights may become inaccurate under the new weight. A naive average will be wrong. Fortunately, weights change gradually. In Cross-Iteration Batch Normalization (CBM), it estimates those statistics from k previous iterations with the adjustment below.

Key ideas

Technical details