PackNet-SG: Semantically-Guided Representation Learning for Self-Supervised Monocular Depth

May 2020

tl;dr: Use pretrained semantic segmentation model to guide PackNet. Use two rounds of training to solve infinite depth issue.

Overall impression

This paper has a different focus than other previous methods which focuses on accurate VO. This paper focuses on accurate depth estimation, especially dynamic object like cars.

Based on the framework of SfM-Learner and SfM, cars moving at the same speed as ego car will be projected to infinite depth. To avoid the infinite depth issue, a simple way to do this is to mask out all dynamic objects and train SfM on static scenes. But this will not give accurate depth on cars during inference.

So the best way to do this is to train SfM on static scenes (in parking lot, e.g.) and during inference, the depth network will generalize to moving cars, as the depth network only takes in a single image.

The infinite depth issue is also tackled in Struct2depth.

Key ideas

Technical details