// news · frontier-models · tools2026-05-26source: google / llm-stats / whatllm

Google I/O drops Gemini 3.5 Flash and Gemini Spark on May 19 — 76.2% Terminal-Bench 2.1 at $1.50/$9 per million tokens, strongest agent price-performance

Google I/O 2026 on May 19 shipped two new models: Gemini 3.5 Flash and Gemini Spark. Flash hits 76.2% on Terminal-Bench 2.1 and 83.6% on MCP Atlas at $1.50/$9 per million tokens — making it the strongest price-performance option for agent workflows on the market right now. The gap to Opus 4.7 and GPT-5.5 on benchmarks is narrow, the gap on cost is enormous.

The Terminal-Bench 2.1 result is the headline number. Terminal-Bench measures agent capability on long-horizon CLI and code tasks — the workload most enterprise agent deployments actually run. 76.2% from Flash, at $1.50 input / $9 output per million tokens, puts the agent-workflow cost-per-task roughly 3-4× lower than Opus 4.7 and 2-3× lower than GPT-5.5 at comparable accuracy. For workloads running hundreds of thousands of agent calls per day, that pricing inflection translates to meaningful annual cost differences — measured in tens of millions for the largest enterprise deployments.

Gemini Spark is the more interesting strategic move. Spark is positioned as a developer-pricing tier that bundles Gemini 3.5 Flash with Google's agent infrastructure (Agent Builder, Vertex AI Reasoning Engine) at a single per-seat price, similar to how Cursor and Claude Code package model-plus-workflow. The pricing is competitive with Cursor's $20/month tier, and the integration with Google's enterprise identity (Workspace, GCP IAM) gives it an immediate advantage in shops that are already on Google infrastructure. The agent platform competition just got materially harder for the standalone tools.

See our analysis →

LLM Stats — AI Updates Today May 2026 Latest AI Model Releases → · WhatLLM — New AI Models May 2026 Frontier Took a Breath → · BuildFastWithAI — Best AI Models of May 2026 Leaderboard →