Google drops DiffusionGemma at 1,000 tok/s—free, open, and somehow still loses to its own sibling 🌀
Google released DiffusionGemma, an open-weight text model under Apache 2.0, with weights available on Hugging Face. The model generates text using diffusion rather than the autoregressive approach used by conventional large language models, starting with a "canvas of random placeholder tokens" and iteratively locking in confident tokens until the block resolves. Google reports 1,000 tokens per second on an NVIDIA H100, four times faster than standard Gemma, and "700+ tokens per second on NVIDIA GeForce RTX 5090." Day-zero support is available in vLLM, Hugging Face Transformers, and Unsloth.
Google says DiffusionGemma trails standard Gemma 4 on output quality and is positioned as a speed model rather than a quality upgrade. Each forward pass produces 256 tokens, and bidirectional attention lets every token see every other token during generation—a property impossible in autoregressive architectures, which can only attend to past tokens. The approach is described in Google's developer guide as refining chunks of garbled text in parallel. In a Google demo, a base version solved roughly 0% of Sudoku puzzles, while a fine-tuned version hit 80%.
The release is the first major open-weight diffusion language model from a tier-one lab. Earlier academic efforts including MDLM, SEDD, LLaDA, and Dream demonstrated the method at smaller scales, and Inception Labs shipped Mercury 2 in February 2026 as the first commercial diffusion reasoning model, claiming speeds five times faster than speed-optimized competitors. Both were closed-weight. Running DiffusionGemma efficiently relies on speculative decoding, where a lightweight drafter proposes token blocks that the main model verifies in a single pass. Google's DFlash framework, published in early 2026, uses a small diffusion model as a drafter to enable more than 6x speedup on some tasks. Local execution via MLX on Apple silicon requires a specific drafter configuration.
Share Article
Quick Info
Disclaimer: This content is for information and entertainment purposes only. It does not constitute financial, investment, legal, or tax advice. Always do your own research and consult with qualified professionals before making any financial decisions.
See our Terms of Service, Privacy Policy, and Editorial Policy.