Google cuts Gemini 2.5 Pro pricing by 32% — matches Claude Sonnet 4.6 on input-token tier, undercuts on cached-context
Google cut Gemini 2.5 Pro input-token pricing by 32% this week, moving the flagship tier into direct parity with Claude Sonnet 4.6 on standard input pricing and undercutting on cached-context billing. The cut is the third Google-led frontier-tier price reduction in seven months and confirms the structural shift toward Flash-tier economics across the closed-flagship stack.
The Gemini cached-context rate is now 78% below Claude's cached rate and 84% below GPT-5's. For agentic workloads that depend on multi-turn cached context (Cursor IDE, AgentX, Glean), the math now strongly favors Gemini-routed flows. Cursor reported 41% Gemini share in last week's developer-survey results, up from 22% in February.
The structural read is that Google's TPU advantage is finally flowing through to per-token economics in a way that the merchant accelerator labs cannot match. Anthropic and OpenAI both run on NVIDIA/AMD merchant silicon and pay merchant-tier compute costs. Google runs Gemini training and inference on TPU v6e at internally-amortized cost. Each Google price cut tests how much of that gap Anthropic and OpenAI can absorb on margin before they have to follow.
Google AI — Gemini 2.5 Pro pricing update → · The Information — Google's third frontier-tier price cut → · Latent.Space — The TPU-economics flywheel finally shows →