// news · agents2026-06-22source: morphllm / developersdigest

Terminal-Bench 2.1 coding leaderboard tightens — Codex CLI on GPT-5.5 at 83.4%, Claude Code on Fable 5 at 83.1%, Claude Code on Opus 4.8 at 78.9%

The Terminal-Bench 2.1 coding-agent leaderboard places Codex CLI on GPT-5.5 at 83.4%, Claude Code on Fable 5 at 83.1%, and Claude Code on Opus 4.8 at 78.9%. The 0.3-point spread between the top two and the 5.5-point spread to the third suggests effective capability parity at the frontier with model-tier-within-vendor stratification becoming the meaningful differentiator.

The substantive piece is the within-vendor tier-spread relative to cross-vendor spread. Codex CLI on GPT-5.5 (#1) vs Claude Code on Fable 5 (#2) is effectively a tie at 0.3 points. Claude Code on Opus 4.8 (#3) trails Fable 5 by 4.2 points — meaning the within-Anthropic tier difference (Fable vs Opus) is larger than the cross-vendor difference (GPT-5.5 vs Fable 5). The procurement implication: tier-selection within vendor matters more than vendor-selection across the frontier.

The competitive read for H2 2026 coding-agent procurement is that the tier selection should be made based on workload-cost-per-task economics, not raw capability ranking. Fable 5 at $10/$50 per 1M tokens is a different cost-per-task profile than GPT-5.5 or Opus 4.8 — the right tier match depends on token volume, latency requirements, and budget structure. The leaderboard tells you capability is comparable; the pricing tells you what to actually deploy.

See our analysis →

MorphLLM — Best AI Coding Agents (June 2026): Scored Leaderboard → · Developers Digest — Best AI Coding Tools June 2026: Updated After Fable 5 Changes Everything →