Slimmable Neural Networks

September 2019

tl;dr: Train a single network and deploy differently depending on the HW.

Overall impression

The idea is actually quite similar to that of universal model. In both cases, the different models share the convolutional layers and have switchable/selectable BatchNorm layers.

In the Slimmable Networks, S-BatchNorm records the different scales and variances of features for each switch. For universal models, the BatchNorm contains the stats for different datasets (which can be vastly different, such as medical imaging datasets and imagenet).

The training procedure prioritizes the first 25% of channels as compared with later channels. This is quite similar to the idea of the pruning method LeGR. Maybe they can be combined?

The authors also showed that 4-switch and 8-switch variants are actually on the same Pareto Front.

This work is extended by universally slimable networks.

Key ideas

Technical details