// news · open-source · tools2026-05-25source: zhipu / alibaba / huggingface

Zhipu GLM-5.1 and Alibaba Qwen 3.6 27B both hit 77%+ on SWE-bench Verified — open-weights at the engineering-task frontier

Zhipu AI shipped GLM-5.1 and Alibaba shipped Qwen 3.6 27B in May 2026, both clearing 77% on SWE-bench Verified — the canonical software-engineering capability benchmark. That puts both models in the same SWE-bench tier as Claude Opus 4.7 (77.2%), GPT-5.5 (76.3%), and Mistral Medium 3.5 (77.6%), at a fraction of the deployment cost.

SWE-bench Verified is the most-cited benchmark for whether a model can do real software engineering work — patching real bugs in real codebases, not synthetic toy problems. Crossing 77% means the model resolves more than three-quarters of a curated benchmark of GitHub issues without human intervention. Through 2024 the closed-frontier vendors had unique claim to this tier. May 2026 has at least five open-weight models matching the closed frontier on this specific benchmark — three of them from China, one from Europe, one from Microsoft (Phi-4 reasoning families).

The defensible technical-architecture story behind the convergence: 2026 frontier-class open-weight models are almost universally sparse Mixture-of-Experts (DeepSeek V4 Pro 1.6T/49B active, Llama 4 Maverick 400B/17B, Qwen 3.5 397B/17B, Mistral Large 3 675B/41B). The MoE architecture lets these labs train models with closed-frontier total parameter counts while keeping inference compute manageable. The closed-frontier vendors are also MoE-based per public disclosures, but they have no architectural moat — only the orchestration, distribution, and compliance moats. Those moats hold for now. The capability moat is gone.

See our analysis →

Hugging Face — Best Open-Source LLM Models 2026 → · ComputingForGeeks — Open Source LLM Comparison Table 2026 → · Acecloud — Best Open Source LLMs In 2026 Benchmarks Licenses GPU Guide →