// news · compute · inference · cerebras2026-05-20source: cerebras / openai

Cerebras WSE-3 hosting GPT-5.3-Codex-Spark sustains 1,000+ tokens per second per agent

Cerebras's WSE-3 wafer-scale chip, hosting OpenAI's GPT-5.3-Codex-Spark variant, sustains over 1,000 tokens per second of generation throughput per agent — roughly 10× the steady-state throughput of GPU-hosted equivalents.

1,000 tokens per second per agent changes the user experience qualitatively. Below 100 tok/s, the agent visibly types. At 1,000 tok/s, the output appears all-at-once for typical responses, and the perceived UX flips from "watch it think" to "see the answer."

The strategic angle: NVIDIA dominates training, but inference is a wider market with a less-defensible technical moat. Cerebras, Groq, SambaNova, and others are now actively competitive at the inference layer — and NVIDIA's own Groq 3 licensing deal (December 2025) shows the company knows it.

Yahoo Finance — NVIDIA Groq 3 GTC 2026 → · CNBC — Meta NVIDIA datacenter deal →