Learning-Deep-Learning

YOLOF: You Only Look One-level Feature

September 2021

tl;dr: Replace MiMo-style (multiple-in-multiple-out) FPN with SiSo (single-in-single-out) neck to build fast and accurate object detector.

Overall impression

This paper delves deep into the success of FPN and proposes that only one feature level can achieve same level of performance.

The success of FPN are due to two factors: 1) multi-level fusion and 2) divide and conquer of label assignment. The paper demonstrates that SiMo structure can also achieve quite good performance, thus we can conclude that

The paper still builds on RetinaNet and can achieve better performance with 57% reduction in FLOPS.

Both ATSS and YOLOF deal with topk anchors. ATSS focuses on dynamically adjusting the threshold to balance the pos/neg anchors based on topk anchors. YOLOF focuses on having balanced pos/neg samples, by ignoring pos samples beyond topk.

Key ideas

Technical details

Notes