Long-Term Feature Banks for Detailed Video Understanding

Jan 2019

tl;dr: Create a long-term feature bank (list of short term features) as attention signal to augment short term features for classification.

Overall impression

The paper builds on top of existing work of video understanding (Non-local net) and addresses the issue of how to exploit the long term signal, given the limited memory. Thus the same features represent both the present and the long-term context. This work tries to decouple the two, allowing long term feature bank to store flexible features that compliment the short term features.

Previous works

Key ideas

\(S_t^{(1)} = NL'_{\theta_1} (S_t, \tilde{L_t}) \\ S_t^{(2)} = NL'_{\theta_2} (S_t^{(1)}, \tilde{L_t}) \\ \vdots\)

Technical details


Things to follow up