Recurrent SSD: Recurrent Multi-frame Single Shot Detector for Video Object Detection

January 2020

tl;dr: Using history to boost object detection on KITTI.

Overall impression

There is a series of paper on recurrent single stage method for object detection. The main idea is to add RNN layer directly on top of the entire image feature.

Another way to look at feature aggregation over time is data fusion. Instead of fusing information from different sensors, it is fusing information from different time-stamp. The fusion technique can be element wise (addition or max), concatenation or recurrent layer.

This is perhaps the best clean solution to video object detection problem. Much cleaner than ROLO.

K=4 frames

Key ideas

Technical details