Deep Double Descent: Where Bigger Models and More Data Hurt

January 2020

tl;dr: Double descent is a robust phenomenon that occurs for various tasks, architectures and optimizers.

Overall impression

This paper extends the work of double descent to deep neural networks.

The main contribution is that the double descent not only happens for more complex models (increasing num of channels) but also for training epochs. The authors proposed effective model complexity (EMC) which is training routine specific to describe this behavior. Increasing training time will increase EMC.

Also more data may not help in the critical region, leading to sample non-monotonicity.

Key ideas

Technical details