Learning to Look around Objects for Top-View Representations of Outdoor Scenes

September 2020

tl;dr: Hallucinate occluded areas in BEV, and use simulation and map data to help.

Overall impression

This is a seminal paper in the field of BEV semantic segmentation, but it does not seem to have received much attention.

This paper goes down a different path compared to Cam2BEV. In Learning to look around objects, the network is explicitly supervised to hallucinate, whereas Cam2BEV eliminates the occlude regions in order to make the problem better posed.

The paper predicts semantic segmentation and depth in order to lift perspective images to BEV. In this sense it is very similar to Lift, Splat and Shoot. It also uses a BEV refinement module to refine the intermediate imperfect BEV map. This is very similar to BEV-seg. –> “Depth and semantics are all you need?”

Human supervision in BEV space is hard to procure. Thus this paper used adversarial loss to make sure the BEV layout looks like a real one. It is very similar in idea to MonoLayout. but a bit different from BEV-seg.

Key ideas

Technical details