TensorMask: A Foundation for Dense Object Segmentation

April 2020

tl;dr: Aligned channel2spatial boosts the performance of instance segmentation than direct channel2spatial.

Overall impression

The paper proposes a relatively rigorous formulation for 4D tensor that unifies DeepMask and InstanceFCN into one framework. The paper seems to be overly complicated to convey two simple ideas: We need to align channel2spatial, and we need large masks for large objects.

The key question to dense instance segmentation: why cannot we naively adopt CenterNet architecture for instance segmentation?

The answer is that training a neural network with $480^2$ channels is intractable. Thus a tradeoff has to be made for $H \times W \times C$. Either predicts a coarse mask and rely on bilinear upsampling and feature alignment to gain better masks, as in TensorMask, or predicts full resolution masks at coarse location grids such as SOLO.

Key ideas

Technical details