January 2026
tl;dr: A curriculum learning experience of iteratively absorbing CoT into language model itself.
Stepwise Internalization is a method designed to achieve implicit chain-of-thought reasoning by gradually removing intermediate reasoning steps during training, first tokens first absorbed.
This work inspired later more influential work such as Coconut.