// news · agents · compute2026-05-24source: goldman sachs / fortune / cnbc

Goldman Sachs forecasts 24× token-consumption explosion by 2030 — 120 quadrillion tokens per month if the agent thesis holds

Goldman Sachs published a research note forecasting that agentic AI workloads could drive a 24-fold increase in monthly token consumption by 2030, reaching 120 quadrillion tokens per month. The number lands as Microsoft pulls back on internal Claude Code licensing and Uber reports burning its 2026 AI-coding budget four months in — three data points pointing the same direction.

The Goldman model treats agents as a step-function rather than a continuous-growth input. The argument: assistive coding tools consume tokens linearly with developer prompt volume; autonomous agents consume tokens superlinearly because the trajectory length grows as the agent becomes more independent. A developer typing into Copilot is a few hundred tokens per query. An agent left running for an hour on a multi-file refactor is hundreds of thousands. The 24× projection follows from assuming autonomous-agent mode becomes default by end-2028.

The implication for the compute stack is straightforward: the inference market is going to be larger than the training market for the first time, by a wide margin. NVIDIA's Q1 data-center revenue of $75.2B already reflects inference share growing faster than training share. If Goldman's 24× holds, the chip companies that win 2026-2030 are the ones with the better $/token at scale — Cerebras and Groq's wafer-scale and LPU-rack pitches rather than NVIDIA's training-rack story.

See our analysis →

Fortune — Microsoft AI cost problem: tokens, agents → · CNBC — Nvidia data center revenue nearly doubles Q1 2027 → · eWeek — Cerebras targets $33B IPO challenging Nvidia →