// news · open-source · tools2026-05-26source: alibaba / huggingface / futureagi

Qwen 3.6 Max-Preview holds #1 on six coding and agent benchmarks simultaneously — SWE-bench Pro, Terminal-Bench 2.0, SkillsBench among them

Qwen 3.6 Max-Preview from late April retains the #1 position on six major coding and agent benchmarks at once: SWE-bench Pro, Terminal-Bench 2.0, SkillsBench, QwenClawBench, QwenWebBench, and SciCode. The benchmark sweep is the most comprehensive coding-and-agent leadership any single model has held simultaneously in 2026.

Six-benchmark simultaneous leadership is the meaningful claim. Single-benchmark wins are common — models specialize in one workload type and trade off others. Six concurrent wins, including the gold-standard SWE-bench Pro (real GitHub-issue resolution) and Terminal-Bench 2.0 (long-horizon CLI agent tasks), indicate the model is not overfitting to a specific evaluation harness. SkillsBench measures structured tool-use; QwenClawBench and QwenWebBench measure GUI agent tasks; SciCode measures scientific computation. The combination spans the workloads enterprise agent deployments actually care about.

The competitive consequence is the gap-closing story between Western and Chinese frontier models. Through 2025 the SWE-bench Pro leaderboard was Claude Opus + GPT plus a few specialized fine-tunes. Qwen 3.6 Max-Preview's late-April release displaced Claude Opus 4.7 on SWE-bench Pro by 1.2 percentage points and held the lead through May despite Gemini 3.5 Flash's competitive run. For enterprise coding workflows that care about benchmark-leading performance and Apache 2.0-compatible licensing, Qwen is now the default choice — not the alternative.

See our analysis →

Hugging Face Blog — Best Open-Source LLM Models 2026 → · Future AGI Substack — Best LLMs in May 2026 What Actually Matters → · BentoML — Best Open-Source LLMs in 2026 →