Meta Llama 4 and Mistral Medium 3.5 anchor the European-American open-weight tier
Meta shipped Llama 4 in April 2026 with Scout (17B active / 109B total MoE, runnable on 10GB VRAM) and Maverick (17B active / 400B total). Mistral Medium 3.5 launched April 29 — a 128B dense model hitting 77.6% on SWE-bench Verified, the best single-vendor coding stack outside the Anthropic and OpenAI labs.
The MoE-with-tiny-active-params architecture (17B active) is the design pattern that opens up consumer-hardware inference. A Scout deployment fits in a single 10GB VRAM card — that is a class of hardware millions of developers already own. Open-weight inference on commodity laptops becomes practical.
Mistral Medium 3.5's dense-128B SWE-bench result is the contrarian bet. Everyone else is going MoE; Mistral is showing that a well-trained dense model still wins on the coding benchmark by competing on per-parameter quality rather than total parameter count. That trade-off matters for memory-constrained inference workloads.
Codersera — Best Open-Source LLM May 2026 → · HuggingFace — Open-Source LLMs 2026 →