Deformable DETR: Deformable Transformers for End-to-End Object Detection

October 2020

tl;dr: Improved DETR that trains faster and performs better to small objects.

Overall impression

Issues with DETR: long training epochs to converge and low performance at detecting small objects. DETR uses small-size feature maps to save computation, but hurt small objects.

Deformable DETR first reduces computation by attending to only a small set of key sampling points around a reference. It then uses multi-scale deformable attention module to aggregate multi-scale features (without FPN) to help small object detection.

Each object query is restricted to attend to a small set of key sampling points around the reference points instead of all points in the feature map.

Deformable DETR is one of the highest scored papers in ICLR 2021.

There are several papers on improving the training speed of DETR.

Key ideas

Technical details