// news · open-source · research-papers2026-06-10source: google deepmind / nvidia blog / marktechpost

Google DeepMind releases DiffusionGemma — 26B MoE open model uses text diffusion to generate 256-token blocks in parallel at 1,000 tok/s on H100

Google DeepMind released DiffusionGemma on June 9 as an experimental Apache-2.0 open-weights model that breaks the autoregressive token-by-token paradigm. The 26B Mixture-of-Experts architecture (3.8B activated) generates whole 256-token blocks in parallel via text diffusion, hitting 1,000 tok/s on a single H100 — roughly 4x the throughput of an equivalent autoregressive Gemma 4.

The architectural shift is the substantive contribution. Every production LLM since GPT-2 has generated one token at a time, conditioned on every prior token. DiffusionGemma applies the denoising-diffusion paradigm — first proven at scale in image generation — to text, generating multiple words simultaneously through iterative refinement of whole blocks. The throughput win comes from removing the autoregressive bottleneck, not from cheaper hardware: same H100, four times the tokens per second.

The trade-off Google explicitly publishes is quality. DiffusionGemma does not match standard Gemma 4 on benchmark outputs — the official write-up admits overall quality is lower because the model prioritizes speed and parallel generation. That makes Fable's open release a research preview, not a production replacement. But it's the first open-weight text-diffusion model at frontier-adjacent scale, with multimodal input (text+image+video), 256K context, and 140+ language support — and NVIDIA is already publishing optimized inference kernels for DGX Spark and DGX Station. If the quality gap closes at the next iteration, the autoregressive paradigm enters its first real competitive test in eight years.

See our analysis →

NVIDIA Blog — NVIDIA Accelerates Google DeepMind's DiffusionGemma for Local AI → · MarkTechPost — Google AI Releases DiffusionGemma, a 26B MoE Open Model Using Text Diffusion → · The New Stack — Google's DiffusionGemma is 4x faster than its other Gemma models →