Weights, fine-tunes, runtimes, and the projects keeping the field auditable.
San Francisco startup founded by ex-Googlers ships four open-source hybrid reasoning models — 70B, 109B, 405B, 671B — using a technique called Iterated Distillation and Amplification (IDA) to distill search-time reasoning back into model weights.
Microsoft's small-language-model bet now includes Phi-4-mini, Phi-4-multimodal (text+audio+vision in one), Phi-4-reasoning, Phi-4-reasoning-plus, Phi-4-mini-reasoning, and Phi-4-reasoning-vision. Reportedly beats DeepSeek-R1-Distill-Llama-70B at most benchmarks despite far smaller size.
DeepSeek's V4-Flash variant (284B total / 13B active parameters, 1M context, MIT license) holds production-tier capability at hyperscaler-routable scale. Combined with V4-Pro (1.6T total / 49B active, 80.6 SWE-Bench Verified, 90.1 GPQA Diamond), DeepSeek now ships the most operationally credible open-weight Pro/Flash split. The 1M context retention in Flash is the structural detail that erases the case for routing to Pro on long-document workloads.
DeepSeek's V4 release (April 24) shipped two SKUs: V4-Pro (1.6T total / 49B active parameters, 80.6 SWE-Bench Verified, 90.1 GPQA Diamond) and V4-Flash (284B total / 13B active, 1M context). Both run under the MIT license, both ship at 1M context, and both clear the bar for production deployment on coding and reasoning workloads. The Pro/Flash bifurcation now mirrors the closed-flagship pricing curve at a fraction of the cost.
Mistral Medium 3.5 (April 29 release) lands at 77.6% on SWE-Bench Verified with EU-friendly licensing terms — the strongest sovereign-jurisdiction coding-model offering in the May 2026 lineup. Combined with Mistral Large 3 (675B / 41B active MoE) and the Voxtral TTS, Forge, and Leanstral releases earlier in the year, Mistral's 2026 H1 cadence is closer to Qwen's monthly tempo than to its prior quarterly pattern.
Alibaba's Qwen 3.6-35B-A3B (Apr 2026) and Qwen 3.6-27B (Apr 2026) continue the team's roughly-monthly drop cadence across 2026 H1. Combined with Qwen 3.5 (Feb 2026, 397B MoE with unified vision-language and 201 languages) and Qwen 3.6 Plus / Max Preview (Apr 2/20), Alibaba now ships the most operationally aggressive open-weights release schedule among Tier 1 labs.
Three labs occupy the open-weight Tier 1 ladder. Each serves a different procurement constraint. The 'open-weight model selection' decision has stopped being a single comparison and become a constraint-mapping exercise. That's a healthier market than the one we had six months ago.
Qwen 3.5 in February. Qwen 3.6 Plus in April. Qwen 3.6 Max Preview also April. Qwen 3.6-35B-A3B and 3.6-27B as open weights. Five major releases in twelve weeks. Mistral and Meta ship slower; Alibaba is teaching the rest of the open-weight community what monthly cadence looks like.
Air Street's State of AI May 2026 report shows Chinese open-weight models — DeepSeek, Qwen, Kimi, GLM — went from roughly 1% of OpenRouter usage in mid-2024 to more than 60% in May 2026. The shift tracks a 5–20× price-per-token gap to closed flagships and a near-elimination of the capability gap on most evaluation suites.
Deep Cogito's v2 release ships four open-weight sizes (70B, 109B, 405B, 671B) wired into an Iterated Distillation & Amplification (IDA) self-improvement loop. The release positions IDA as a deployable architecture rather than a research curiosity — the first open-weight family where the "model improves itself between checkpoints" methodology is shipped as the default training recipe.
DeepSeek extended the 1M context window to its V4 Flash tier this week, pushing the cheaper standard SKU into a capability bracket previously occupied only by V4 Pro and closed flagships. Combined with the unchanged 80.6% SWE-Bench Verified ceiling and the MIT/Apache-2.0 license, the practical effect is to compress the price-quality gradient on long-context production workloads.
Mistral Medium 3.5, released April 29 and now widely available across cloud providers, hit 77.6% SWE-Bench Verified — putting it within striking distance of Qwen 3.5 and DeepSeek V4 on coding while shipping under Apache 2.0 from a Paris-based lab. For EU enterprises navigating data-residency-plus-IP-clarity procurement constraints, the model is the most defensible production-tier coding choice currently available.
Sometime in early 2026, Chinese open-weight models crossed 50% of OpenRouter usage. The exact moment matters less than the realization: production share has already migrated. The policy conversation is debating a battle that's already moved one front forward.
Gemini 3.5 Flash hits 76.2% Terminal-Bench. DeepSeek V4 Flash gets 1M context. Mistral Medium 3.5 hits 77.6% SWE-bench Verified at Apache pricing. The 2026 frontier isn't the highest-capability model — it's the highest-capability-at-Flash-pricing model.
DeepSeek released V4 (Pro at 1.6T total / 49B active, Flash at 284B total / 13B active) on April 24 under MIT licensing. Both variants ship with 1M token context. V4 Flash pricing of $0.14/M input is the floor for the open-weight frontier and is forcing competing labs to reprice or differentiate on capability.
Three open-weight releases in two weeks: Moonshot Kimi K2.6 (top-tier coding, 1T total / 32B active, 256K context), Z.ai GLM-5.1 ($0.18/M input), and Qwen 3.6 27B (77.2% SWE-bench Verified). The open-weight pace has now compressed to roughly one Pro-tier release per week.
Meta shipped Llama 4 in April 2026 with Scout (17B active / 109B total MoE, runnable on 10GB VRAM) and Maverick (17B active / 400B total). Mistral Medium 3.5 launched April 29 — a 128B dense model hitting 77.6% on SWE-bench Verified, the best single-vendor coding stack outside the Anthropic and OpenAI labs.
Mistral Large 3 lands as a 675B-total / 41B-active sparse Mixture-of-Experts model under Apache 2.0 licensing. The architecture choice mirrors DeepSeek V4 and Llama 4 Maverick — the open-weight tier has converged on sparse MoE as the default frontier architecture.
DeepSeek V4 under MIT, GLM-5.1 at $0.18/M, Kimi K2.6 at 256K context, Llama 4 Maverick. The open-weight frontier is now within a few SWE-bench points of closed flagships at one-tenth the input cost. The structural implications run deeper than pricing.
Z.ai (GLM-5.1), MiniMax (M2.7), Moonshot (Kimi K2.6), and DeepSeek (V4) all landed in a 12-day window in early-to-mid May 2026 — all clearing 75%+ on SWE-bench Verified, all priced below $0.30/M input tokens, all permissively licensed for commercial use.
Meta has confirmed it will release open-weights versions of its next two frontier models, codenamed Avocado and Mango, while keeping the largest variants proprietary — a hybrid strategy that splits the difference between Llama's open-source heritage and the closed-model economics of rival labs.
Z.ai released GLM-5.1 with 1 million token context and inference pricing of $0.18 per million input tokens — undercutting DeepSeek V4 Flash ($0.14) only narrowly while matching it on SWE-bench Verified at 76.4%.
DeepSeek shipped V4 Pro (and V4 Flash) on Hugging Face and the official API. Headline numbers: 80.6 SWE-Bench Verified, 90.1 GPQA Diamond, 1M token context. V4 Flash undercuts most frontier pricing at $0.14 per million input tokens.
Alibaba released Qwen 3.6 27B with a 77.2% SWE-bench Verified score — a frontier-competitive number on a model small enough to run on a single H100. The 27B parameter sweet spot has become the most-shipped open-weights size of 2026.
Moonshot AI shipped Kimi K2.6, a coding-specialized open-weights model that posts the strongest SWE-bench Verified score among open releases — narrowly ahead of DeepSeek V4 Pro on multi-file edits.
Granite 4.1 covers 3B / 8B / 30B language models, Granite Vision 4.1 (top score on 7 chart/table/KVP extraction benchmarks), two ASR speech models, embeddings, and a Granite Guardian 4.1 safety classifier — every variant under Apache 2.0. The 8B dense model reportedly matches or beats 32B MoE systems.
Mistral Medium 3.5 (Apr 29) is a frontier multimodal model targeted at agentic and coding workloads. It's the headline at the end of a stretch where Mistral shipped Small 4 (unifying Magistral/Pixtral/Devstral), Voxtral TTS, Leanstral for formal proofs, and the Forge enterprise platform — all between March 16 and end of April.
Mistral's April 29 release ships under a modified MIT license, with 77.6% on SWE-Bench Verified — positioning the model ahead of Devstral 2 and Qwen 3.5 397B A17B at a fraction of the active-parameter budget.
Nemotron 3 Nano Omni (April 28) unifies vision, audio, language, and text into one open multimodal model. The architecture is the interesting bit: a hybrid Mamba-Transformer MoE with 30B parameters and only 3B activated per forward pass.
NVIDIA's open Nemotron 3 Super lands as a 120B-parameter hybrid MoE with 12B active and a 1M-token context window. The explicit design target: local agent deployment with tool-augmented coding workloads.
NVIDIA's open Nemotron 3 Nano Omni unifies vision, audio, and language processing in a single model, claiming up to 9x efficiency improvement for agent workloads versus equivalent stacks of specialist models.
Qwen 3.6 Plus dropped April 2; Qwen 3.6 Max Preview followed April 20. Alibaba's framing: "accelerating agentic AI deployment for enterprises and Alibaba's AI applications." Built on the Qwen 3.5 native-multimodal foundation from February, which supports 201 languages.
Gemma 4 (April 2) arrives in E2B / E4B / 26B MoE / 31B Dense variants with native image+video everywhere and native audio on the smaller models. 256K context, 140+ languages, agentic-workflow-oriented. The 31B Dense reportedly hit #3 on Arena's text leaderboard.
Qwen 3.5 Omni (released March 30) is a native multimodal model handling text, audio, video, and real-time interaction. Real-time audio time-to-first-token comes in below 300ms with 95%+ ASR accuracy — the relevant numbers for actual voice-assistant deployment.