Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

November 2020

tl;dr: Sparse proposal and iterative refinement for a two-stage end-to-end object detector.

Overall impression

This paper rethinks the necessity of dense priors (either anchor boxes or reference points) in object detection, very similar to TSP. Sparse RCNN uses a number of sparse proposals (N « HWk dense priors) for object detection.

There are several papers on improving the training speed of DETR.

The iterative head design is quite inefficient in capturing the context and relationship with other parts of the image, and thus needs quite a few iterations (~6 cascaded stage). In comparison, the sparse cross attention in Deformable DETR and TSP may be a better way to go.

The authors also wrote OneNet which is a single-stage easy-to-deploy end to end object detection model.

Key ideas

Technical details