Learning-Deep-Learning

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

January 2026

tl;dr: Block decoding, by enabling kV cache for diffusion LLM, and confidence aware parallel decoding.

Overall impression

Approximate the KV-cache by noticing that neighboring decoding timesteps have very similar KV attention map.

Key ideas

Technical details

Notes