Non-local Neural Networks

Jan 2019

tl;dr: Use a non-local module to capture long range dependencies (spatial, time or spatialtime).

Overall impression

The design of non-local module is modular, which does not change the input shape. Therefore it can be plugged into existing network design easily. This is similar to the deisgn of SE-net. The authors show that self-attention proposed in Google’s transformer paper is a special case (embedded Gaussian) of non-local networks.

Key ideas

\[y_i = \frac{1}{C(x)}\sum_{\forall{j}} f(x_i, \hat{x}_j) g(\hat{x}_j) \\ z_i = \text{BatchNorm}(W_z y_i) + x_i\]

Technical details

\[g(x_j) = W_g x_j \\ f(x_i, x_j) = F(\theta(x_i)^T \phi(x_j)) \\\]