GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks

August 2019

tl;dr: Dynamically adjust the weight of the tasks based on training progress.

Overall impression

GradNorm and dynamic task prioritization are very similar. However the weight rate adjustment is exponential as compared to focal loss (essentially also exponential), and the progress signal is loss drop as compared to KPI.

Task imbalance impede proper training because they manifest as imbalances between backpropogated gradients. The balance is struck when tasks are training at similar rates as measured by the loss ratio L(t)/L(0).

Methods Learning Progress Signal hyperparameters
Uncertainty Weighting Homoscedastic Uncertainty No hyperparameters
GradNorm Training loss ratio 1 exponential weighting factor
Dynamic Task Prioritization KPI 1 focal loss scaling factor

Key ideas

Technical details