EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

June 2019

tl;dr: Scaling network up jointly by resolution, depth and width is a wiser way to spend inference budget.

Overall impression

The paper proposed a simple yet principled method to scale up networks. The main difference from previous work is the exploration of image resolution in addition to network architecture (width and depth).

The EfficientNet-B0 is similar to MnasNet. Scaling this network up pushes the Pareto frontier for imageNet significantly, achieving similar accuracy with x10 reduction in FLOPs and parameters. In other words, beefed-up MobileNet beats SOTA such as ResNet and ResNeXt. EfficientNets studies how to spend more resource wisely. The depth, width and resolution scaling factors are usually larger than 1.

On the other hand, the mobilenets papers (v1, v2 and v3) goes the other way round. They start with an efficient network and scales it down further. The channel and resolution scaling factors are usually smaller than 1. Note that MobileNetv3-Large optimizes based on MnasNet. Therefore EfficientNet-B* is really all about how to scale up MobileNet, and tells us that a beefed-up MobileNet is better than ResNet. In the original MobileNetsv1

This paper inspired follow-up work EfficientDet, also by Quoc Le’s team.

Key ideas

Technical details


params_dict = {
      # (width_coefficient, depth_coefficient, resolution, dropout_rate)
      'efficientnet-b0': (1.0, 1.0, 224, 0.2),
      'efficientnet-b1': (1.0, 1.1, 240, 0.2),
      'efficientnet-b2': (1.1, 1.2, 260, 0.3),
      'efficientnet-b3': (1.2, 1.4, 300, 0.3),
      'efficientnet-b4': (1.4, 1.8, 380, 0.4),
      'efficientnet-b5': (1.6, 2.2, 456, 0.4),
      'efficientnet-b6': (1.8, 2.6, 528, 0.5),
      'efficientnet-b7': (2.0, 3.1, 600, 0.5),