Learning-Deep-Learning

ResNeSt: Split-Attention Networks

May 2020

tl;dr: A new drop-in replacement for ResNet for object detection and segmentation task.

Overall impression

It is almost a combination of ResNeXt and SKNet, with improvement in implementation (cardinality-major to radix major).

I do feel that the paper uses too much tricks (MixUp, AutoAugment, distributed training, etc) and is too similar to SKNet, especially that the hyperparameter selection reduces this work. Engineering contribution > innovation.

Key ideas

Cardinality concept is the same as ResNeXt.
The split attention module is very similar to SKNet but with the same kernel size.
The change from cardinality-major to radix-major was implemented for better efficiency (how much?).

Technical details

The final selected hyperparameters are K=1 and R=2. This is very similar to SKNet.

Notes

Analysis of radix-major in 知乎
This work proves that, with tricks, ResNet can also be SOTA. This is better than works reinventing the wheel such as EfficientDet.
- MobileNet and DepthWise convolution can only accelerate on CPU and are better suited for edge devices.