FCOS: Fully Convolutional One-Stage Object Detection

June 2019

tl;dr: FCOS leverages all pixels in the bbox for regression, instead of just the center, to alleviate the imbalance problem. Each pixel in the feature map is responsible for regress four distances to the four edges of GT bbox.

Overall impression

This paper is littered with hidden gems! Lots of tricks and insights on training neural nets or architecture design. The paper assumes bbox annotation. If mask is also available, then we could use only the pixels in the mask to perform regression.

The idea is similar to CenterNet. CenterNet uses only the points near the center and regresses the height and width, whereas FCOS uses all the points in the bbox and regresses all distances to four edges. FCOS does not predict centers of bbox using keypoints heatmap but rather uses anchor free methods.

FCOS regressed distances to four edges, while CenterNet only regresses width and height. The FCOS formulation is more general as it can handle amodal bbox cases (the object center may not be the center of bbox).

I personally feel this paper is better than centernet in the sense that it does not need too much bells and whistles to achieve the same performance.

It is extended to PolarMask for one-stage instance segmentation.

The paper inspired ATSS which explained why FCOS can achieve better performance than RetinaNet.

Key ideas

Technical details