// news · frontier-models · agents · benchmark2026-05-19source: anthropic / leaderboards

Claude Code holds 78.4% SWE-bench Verified lead over Codex, Cursor, Devin, Replit

Updated SWE-bench Verified leaderboards confirm Claude Code at 78.4% — meaningfully ahead of OpenAI Codex at 71.0%, Cursor agent at 67.2%, Devin at 60.8%, and Replit Agent 3 at 54.1%. The 7-point gap to second place is the widest single-agent lead the benchmark has seen.

The lead reflects two compounding advantages: Claude Code runs on Opus 4.7 (still the strongest base model for code synthesis), and the MCP-native tool integration architecture lets the agent compose external tools in ways the others can't yet match cleanly.

The gap to Cursor (11 points) and Devin (17 points) is the more strategically interesting fact. Cursor wins on inline UX and seat count; Devin wins on long-horizon autonomous runs. But on the canonical "fix real GitHub issues" benchmark, Claude Code is roughly two generations ahead. Expect the rest of the field to converge on MCP and Opus-class models throughout Q3 2026.

Anthropic — Claude Opus 4.7 → · digitalapplied — agent comparison →