// news · open-source · model-release2026-06-03source: minimax.io

MiniMax M3 ships an open-weight model at 59% on SWE-Bench Pro, edging GPT-5.5

Chinese AI lab MiniMax released M3 on June 1, claiming the first open-weight model to hit frontier coding scores, a 1-million-token context window, and native multimodality in a single checkpoint. The model scored 59.0% on SWE-Bench Pro, ahead of GPT-5.5 and Gemini 3.1 Pro and within striking distance of Claude Opus 4.7 at 64.3%. The catch: weights are not actually on Hugging Face yet — MiniMax promised them within ten days.

MiniMax launched M3 on June 1 with a claim that, if it holds up, ends a roughly two-year run in which closed frontier labs kept a clean lead on serious coding benchmarks. According to MiniMax's own technical post, M3 scored 59.0% on SWE-Bench Pro, narrowly beating GPT-5.5 at 58.6% and topping Gemini 3.1 Pro, while Claude Opus 4.7 still leads at 64.3%. The model also posted 66.0% on Terminal-Bench 2.1 and 70.06% on OSWorld-Verified for computer use. That last number is the more interesting one — agentic computer use was the last category where open weights weren't really in the conversation at all.

The architectural story is MiniMax Sparse Attention (MSA), which the company says cuts per-token compute at 1M context to one-twentieth of the prior generation, with greater than 9x faster prefill and greater than 15x faster decoding versus M2. The design partitions the KV cache into blocks and uses what MiniMax calls a "KV outer gather Q" pattern to keep memory access contiguous. If the speedups translate to real inference economics, it matters more than the headline benchmark: long context has been priced as a luxury feature because the attention math punished it, and MSA is a credible attempt to make 1M tokens a routine operating point rather than a demo.

The honest caveat is that none of this is independently verified yet. MiniMax has committed to publishing the technical report and open-sourcing the weights "over the next 10 days," but as of the release the model is accessible only through MiniMax's API and Code agent product, with no Hugging Face or GitHub drop. SWE-Bench Pro is harder to game than its predecessor, but a self-reported number from the model author is still a self-reported number. The June 11-ish weight drop window is the real test — that's when third parties get to re-run the evals and confirm whether 59.0% survives a clean lab environment. The pattern of "announce, then ship weights a week later" is becoming standard for Chinese labs (see our prior coverage of DeepSeek V4's release cadence), and it's worth watching whether the second shoe actually drops.

The strategic picture: closed labs no longer have a defensible moat on coding alone. Opus 4.7 is still ahead, but five points on SWE-Bench Pro is not a generational gap — it's a sprint, not a chasm. What Anthropic, OpenAI, and Google still own is the integration surface — tool use reliability, agentic harness quality, enterprise compliance, the parts that don't show up on a benchmark. If MiniMax actually ships the weights on time and the eval numbers replicate, the open-weights ecosystem inherits a credible 1M-context coding model for free, and the question for the closed labs becomes whether their non-benchmark advantages are worth the price delta. We'd bet yes for production buyers and no for the long tail of developers who were already going to self-host whatever's good enough.

MiniMax Research: MiniMax M3 — Frontier Coding, 1M Context, Native Multimodality → · MarkTechPost: MiniMax Releases MiniMax M3 with MSA Architecture → · TechTimes: MiniMax M3 Open-Weight Coding Model — Frontier Claims, Unverified Benchmarks →