Mask Encoding for Single Shot Instance Segmentation

April 2020

tl;dr: Encode mask into lower dimensional representations for prediction.

Overall impression

The paper showed that directly predicting high dimensional masks 28x28 = 784 is tractable, but leads to worth performance than predicting a lower dimensional vector and recover mask from it (N=60).

However this is still region based and pretty much depends on bbox. The binary masks are with respect to the bbox.

The idea of doing PCA to compress shape is very similar to that of RoI10D. This is to be compared with CondInst

Key ideas

Technical details