Anchor DETR: Query Design for Transformer-Based Object Detection

September 2022

tl;dr: Use encoded anchor to explicitly guide the detection queries to enable more focused attention and detection.

Overall impression

The paper is one of the series of papers to improve the convergence rate of DETR.

The object queries in the original DETR is a set of learned embedding, and does not have physical meaning. The object queries do not focus on a specific region, making it hard to optimize. In DETR, object queries from different positions predict objects collaboratively.

Anchor DETR is concurrent with Conditional DETR, and the ideas are roughly the same. Anchor DETR uses encoded anchors as the queries, and conditional DETR uses encoded anchors as positional embeddings and the object queries are still learned.

Key ideas

Technical details