// news · frontier-models · industry2026-05-23source: google / theinformation / latent.space

Google cuts Gemini 2.5 Pro pricing by 32% — matches Claude Sonnet 4.6 on input-token tier, undercuts on cached-context

Google cut Gemini 2.5 Pro input-token pricing by 32% this week, moving the flagship tier into direct parity with Claude Sonnet 4.6 on standard input pricing and undercutting on cached-context billing. The cut is the third Google-led frontier-tier price reduction in seven months and confirms the structural shift toward Flash-tier economics across the closed-flagship stack.

The Gemini cached-context rate is now 78% below Claude's cached rate and 84% below GPT-5's. For agentic workloads that depend on multi-turn cached context (Cursor IDE, AgentX, Glean), the math now strongly favors Gemini-routed flows. Cursor reported 41% Gemini share in last week's developer-survey results, up from 22% in February.

The structural read is that Google's TPU advantage is finally flowing through to per-token economics in a way that the merchant accelerator labs cannot match. Anthropic and OpenAI both run on NVIDIA/AMD merchant silicon and pay merchant-tier compute costs. Google runs Gemini training and inference on TPU v6e at internally-amortized cost. Each Google price cut tests how much of that gap Anthropic and OpenAI can absorb on margin before they have to follow.

See our analysis →

Google AI — Gemini 2.5 Pro pricing update → · The Information — Google's third frontier-tier price cut → · Latent.Space — The TPU-economics flywheel finally shows →