MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Mar 2019

tl;dr: Factorize normal 2D convolution operations into depth separable convolutions (depthwise convolution and pointwise convolution) to reduce latency as well as model size.

Overall impression

The way normal 2D conv op handles channel information is almost in a fully connected fashion. Each channel in the input is filtered and weighted into the output by different weights in different and corresponding layers in the conv kernel. Depthwise separable conv applies the same 2D conv kernel to all depths, and uses a pointwise conv (1x1 conv) to combine it.

This is followed up and improved by MobileNets v2 and MobileNets v3.

Key ideas

Technical details