Part-level Car Parsing and Reconstruction from a Single Street View

January 2020

tl;dr: Train with synthesized data with parts and weekly labeled real data to transfer part knowledge. Directly regress distance.

Overall impression

The main problem this paper is trying to solve is occlusion.

Instance mask can also be ambiguous caused by object symmetries. Two similar masks could have quite different orientations.

It’s hard to detect accurate landmarks for low-resolution cars.

The model has 70 semantic parts, but during training, they are grouped into 13 super-parts, which is more reasonable.

Key ideas

Technical details