Universally Slimmable Networks and Improved Training Techniques

September 2019

tl;dr: Extend slimmable network to arbitrary width in a context of channel-wise residual learning.

Overall impression

This paper is gold! Lots of insights into the training of neural network.

The practice into calculating the batch norm stats after training is quite enlightening. Refer to the presentation by Yuxin Wu from FAIR.

The universally slimmable network can be trained to achieve similar or slightly better performance with slimmable networks with fixed number of switches.

Essentially training progressively smaller/narrower networks serve as deep supervision of the entire network.

Key ideas

Technical details