January 2026
tl;dr: Reason in unconstraint latent space vs language space to bypass the “Vocabulary Bottleneck”.
Overall impression
Coconut allows more efficient reasoning, reducing token num and also induces new thining patterns.
The latent thinking requires a normal training with explicit CoT first. It creates a latent thinking stage marked by and , and iteratively repalces each step of explicit thinking (1 thinking step ≈ a short sentence or clause, x language tokens) with c latent thought tokens.
One possible drawback is that this cannot decode into human language and is hard to visualize. Maybe DLCM is a better method.
Key ideas
- More efficient reasoning
- Replaces each reasoning step with c continuous tokens
- How does this reduce inference steps? c is a hyperparameter that can be finetuned.
- Induces BFS
- Standard LLMs are DFS as every time a word is generated, the model’s internal probability distribution collapses into a single token. And LLM cannot do backtrack easily (only explcitly with CoT at grammatically correct tuning point).
- Entropy analysis: in early latent steps, the model assigns high and nearly equal probabilities to multiple different valid next steps.
- Performs well on ProsQA, PrOntoQA, where early assumptions are ambiguous, and DFS gets stuck. Explicit CoT (DFS) fails even when lengthened.
- Latent Coconut recovers.
Technical details
Notes
- The iterative nature of latent token is very similar to diffusio language models. The connection is also highlighted in the latent CoT survey.
- In both diffusion and latent CoT, “commitment” is delayed until the end. Commitment ≈ decoding into natural language. (h → softmax(W·h) → token)