SGDepth: Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance

August 2020

tl;dr: Build a Mannequin dataset for monodepth. Use segmentation mask to filter out real moving object.

Overall impression

The paper addresses the moving object issue by adaptively filter out regions that has large dynamic movement. The motion segmentation idea is explored in Competitive collaboration before.

Segmentation techniques are also used in Every Pixel Counts which proposes an implicit binary segmentation. SGDepth does not extend the image projection model to include cars, but simply exclude the car pixels. But this alone will lead to poor performance as depth of car pixels will not be learned at all.

But this method still seems to suffer from the infinite depth problem. We need to integrate the depth estimation with depth hints. PackNet-SG provides an intuitive way to

SGDepth develops a method to detect frames with non-moving cars, similar to that of Mannequin dataset. In other words, moving cars should be excluded from loss computation while stationary cars should still be used.

Key ideas

Technical details