BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

April 2020

tl;dr: Learn a low-resolution attention map to improve YOLACT.

Overall impression

The paper improves upon YOLACT. It is not only using bbox to crop the prototype map but also predict an attention map within the bbox. From this standpoint, it is more similar to Mask RCNN than even YOLACT.

YOLACT predicts just a single number to blend prototype masks, but BlendMask predict a low-res attention map to blend the masks within the bbox.

BlendMask is trying to blend with a finder grained mask within each bbox, while CondInst is trying to blend with deeper convs with dynamically predicted filters.

CenterMask works almost in exactly the same way as BlendMask. See my review on Medium.

Key ideas

Technical details