Cam2BEV: A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird’s Eye View

September 2020

tl;dr: Uses spatial transformer module with IPM feature transform to transform perspective features to BEV.

Overall impression

For surfaces, IPM can accurate transform image to a BEV. For 3D objects such as vehicles and VRUs, it is hard to estimate their position relative to the sensor.

Uses semantic segmented images as input, which helps with bridging the sim2real domain gap. This step remove mostly unnecessary texture from real-world data by computing semantically segmented camera image. The idea of using semantic segmentation to bridge the sim2real gap is explored in many BEV semantic segmentation tasks such as BEV-Seg, CAM2BEV, VPN.

The proposed uNetXST architecture transforms four perspective semantic segmented images into one aggregated BEV semantic segmentation image.

In Learning to look around objects, the network is explicitly supervised to hallucinate, whereas Cam2BEV eliminates the occlude regions in order to make the problem better posed.

Key ideas

Technical details