Deep Metadata Fusion for Traffic Light to Lane Assignment

August 2019

tl;dr: Convert metadata to binary masks and use deep metadata fusion to improve traffic light to lane assignment.

Overall impression

This work improves upon the previous only-vision baseline (from same authors, without any meta data).

The authors converted all meta data to binary masks, as all the meta data are spatial info. –> This method may not work well if some meta data cannot be spatially encoded.

Embedding meta data info into conv:

The traffic lights to lane assignment problem can be solved traditionally using huristics rule-based methods, but not very reliably. Only-vision approach (enhanced with enlarged ROIs), and with meta data, works almost as good as human performance in subjective test. This is because the relevant traffic lights are not always mounted directly above the their associated lane.

I feel that lane localization is very challenging. How do we know if the current lane is right turn only? Are we going to rely on HD maps or vision alone for this? At least some vision based methods should be used.

Key ideas

Technical details