The China-share tipping point — when did the OpenRouter graph cross 50%?
Sometime in early 2026, Chinese open-weight models crossed 50% of OpenRouter usage. The exact moment matters less than the realization: production share has already migrated. The policy conversation is debating a battle that's already moved one front forward.
The graph nobody wanted to look at
The Air Street State of AI report shows a clean exponential: ~1% in mid-2024, ~12% by end of 2024, ~35% by mid-2025, ~60% by May 2026. The curve was visible to anyone watching OpenRouter analytics throughout 2025. The policy conversation was elsewhere.
Why developers chose
The 5–20× price-per-token gap is the headline reason. But it's not the only one. Three forces compounded:
- Latency. DeepSeek V4 Flash, Qwen 3.6 27B, and GLM-5.1 all ship at sub-100ms TTFT for typical workloads — comparable to closed flagships and meaningfully better than the second-generation OpenRouter routes.
- License clarity. Apache 2.0 and MIT remove the legal-review tax that Llama's bespoke license still imposes on enterprise buyers. The IP team approves Apache 2.0 in a single review cycle.
- Capability-on-the-frontier. Qwen 3.6's 77.2% SWE-bench Verified is the load-bearing number. Six months ago, this would have been "close to closed flagships." Now it's median open-weight.
What the export controls bought
The 2024–2025 GPU export-control regime was designed to slow Chinese frontier capability. Looking at the OpenRouter curve, the policy didn't slow capability — it slowed where GPUs ship. Capability accreted at Chinese frontier labs anyway, partly through algorithmic efficiency gains, partly through alternative-substrate procurement, and partly through the fact that frontier capability isn't actually GPU-floor-limited at this stage of the curve.
That's not a critique of the policy intent. It's a measurement of what the policy bought. Twelve to eighteen months of capability delay, possibly. Production-deployment dominance reversal — not even close.
The procurement implication
If you're standing up a 2026 inference stack today, the question is not "Chinese model yes or no" — that ship sailed. The question is which Chinese models, under what attestation regime, with what failover to closed-flagship tiers for the 5–10% of workloads where the moat actually matters. The dual-tier inference architecture we wrote about above is the rational procurement answer.
The next analytical question — and the one nobody at the policy table has answered cleanly — is whether 60% becomes 75% or whether it stabilizes. The honest answer is we don't know yet. The next two OpenRouter quarterly dumps will tell.
Air Street — State of AI May 2026 → · Codersera — best open-source LLM May 2026 → · HuggingFace — open-source LLMs 2026 →