MiniMax M3's third deployment week hardens the China-OSS coding-frontier lead — enterprise pilots produce 3-week longitudinal data while Llama 5 stays absent
MiniMax M3's third public week of enterprise pilots produces the first 3-week longitudinal data on a frontier-class open-weight coding model. Performance holds against the 59% SWE-Bench Pro baseline; deployment cost runs ~40% below comparable closed-API frontier coding. The China-OSS coding-frontier procurement-default is now a 3-week-old fact rather than a 1-week speculation.
The substantive piece is the longitudinal evaluation maturing. Week-1 OSS model launches always look good — community evaluation hasn't run long enough to surface real-world degradation patterns. Week-3 is the first checkpoint where enterprise pilots have run real workloads across enough variance to produce confidence intervals on the lab's launch claims. MiniMax M3 holding its 59% SWE-Bench Pro through week 3 means the multi-axis-convergence claim (frontier coding + 1M context + native multimodal in one open-weight model) is genuinely durable rather than benchmark-gamed.
The structural pattern against Meta's continued Llama 5 silence is that each week MiniMax M3 produces production-deployment data is a week Meta's OSS-frontier narrative-recovery becomes harder. Enterprise OSS procurement cycles take 3-6 months from pilot to production commitment; the deployments landing on M3 right now will be 6-month production commitments before Llama 5 ships. Meta's structural OSS-frontier disadvantage is now durable rather than recoverable through a single launch.
HuggingFace — Best Open-Source LLM Models in 2026: Coding, Local, Agentic AI, Benchmarks, and License → · Featherless — Best Open-Source LLMs in 2026 → · Computing for Geeks — Open Source LLM Comparison Table (2026) →