Learning-Deep-Learning

PointPillars: Fast Encoders for Object Detection from Point Clouds

Mar 2019

tl;dr: Group lidar data into pillars and encode them with pointnet to form a 2D birds view pseudo-image. SSD with 2D convolution is used to process this pseudo image for object detection. It achivees the SOTA at 115 Hz.

Overall impression

This paper follows the line of work of VoxelNet and SECOND and improves the encoding methods. Both voxelnet and SECOND encode point cloud into 3D voxels and uses expensive 3D convolution. The main contribution of this paper lies in that it encodes (“sacrifices”) the information of the relatively unimportant dimension of z into different channels of the 2D pseudo image. This greatly boosts the inference.

Note that both PointPillars and the successor MVF are both still using anchors for prediction.

Key ideas

Technical details

Notes