// news · open-source · frontier-models2026-05-30source: qwen.ai / artificial analysis / huggingface

Alibaba ships Qwen 3.7 Max — GPQA Diamond 92.4 beats Claude Opus 4.6 in open-weight frontier reasoning leaderboard

Alibaba released Qwen 3.7 Max on May 20, 2026, with GPQA Diamond score of 92.4 — overtaking Claude Opus 4.6 on the standard frontier reasoning benchmark. The release continues China's monthly cadence of frontier-class open-weight model drops, narrowing the closed-vs-open capability gap to weeks-not-months.

The benchmark detail is what matters. GPQA Diamond at 92.4 is the highest score from any open-weight model on record; Claude Opus 4.6 sat at ~91 on the same benchmark before Qwen 3.7 Max's release. The capability-gap rhetoric that dominated 2024-2025 — "open weights trail proprietary by 6-12 months" — no longer matches the data. Between mid-April and mid-May 2026, Moonshot shipped Kimi K2.6, Z.ai shipped GLM-5.1, DeepSeek shipped V4 Pro and V4 Flash, Xiaomi shipped MiMo-V2.5-Pro, Google released Gemma 4, Alibaba released Qwen 3.6 and now 3.7 Max — nine frontier-class open-weight models in six weeks.

The licensing matters as much as the benchmarks. Qwen 3.7 Max ships under Apache-2.0 — commercially permissive, no usage caps, no telemetry. Enterprise deployment teams evaluating frontier reasoning today can hit Opus-class numbers on infrastructure they fully control. See our analysis →

See our analysis →

Qwen — Qwen 3.7 Max release notes → · Understanding AI — The best Chinese open-weight models vs the strongest US rivals →