MonoResMatch: Learning monocular depth estimation infusing traditional stereo knowledge

August 2020

tl;dr: Inject proxy labels from traditional stereo knowledge into monodepth with stereo.

Overall impression

The paper learns to emulate a binocular setup. Basically it hallucinates a disparity map (DispNet), refines it with geometric constraints (with horizontal correlation layer), and then estimates the depth. It it similar to and inspired by Single View Stereo Matching.

The paper is similar to Depth Hints in the sense that it also uses proxy labels from SGM (semi-global matching) on stereo pairs (unavailable during inference time) to guide monodepth pipeline, but it adds the proxy label self check to reduce the noise.

Both Depth Hints and MonoResMatch propose to use cheap stereo GT to build up monodepth dataset. Depth Hints uses multiple param setup to obtain an averaged proxy label and use a soft (hint) supervision scheme. MonoResMatch uses left-right consistency check to filter out spurious predictions and a traditional hard supervision scheme.

Key ideas

Technical details