BEV-feat-stitching: Understanding Bird’s-Eye View Semantic HD-Maps Using an Onboard Monocular Camera

January 2021

tl;dr: predict BEV semantic maps from a single monocular video.

Overall impression

Previous SOTA PyrOccNet and Lift splat shoot studies how to combine synchronized images from multiple cameras into a coherent 360 deg BEV map. BEV-feat-stitching try to stitch monocular video into a coherent BEV map. This process also requires knowledge of the camera pose sequence.

The mapping of the intermediate feature map resembles that of feature-metric mono depth and feature-metric distance in 3DSSD.

To be honest the results do not look as clean as PyrOccNet. Future work may be to combine these two trends, from both BEV-feat-stitching and PyrOccNet.

This paper has a follow-up work STSU for structured BEV perception.

Key ideas

Technical details