Mistral ships Medium 3.5 — 128B dense open-weight model under modified MIT consolidates Magistral reasoning and Devstral coding into one set of weights
Mistral released Mistral-Medium-3.5-128B as a single dense open-weight model that retires both Magistral (reasoning) and Devstral 2 (coding) into one unified set of weights with a 256K-token context window. The release ships under a modified MIT license on Hugging Face — Mistral's most permissive open-weight drop since the Mixtral era.
The consolidation strategy is the story. Mistral had been running three separate model families through 2025-2026: Medium for general instruction-following, Magistral for reasoning, Devstral for coding agents. Medium 3.5 collapses all three into a single 128B dense model — a deliberate bet that one well-trained set of weights beats three specialized models at production deployment economics. Enterprises now have one model card to evaluate, one fine-tuning recipe to run, one inference runtime to standardize on.
The licensing matters more than the architecture. Modified MIT on a 128B dense model with 256K context is the most permissive open-weight release of any current frontier-adjacent model. Combined with Qwen 3.7 Max's Apache-2.0 release (May 20) and the broader Chinese open-weight cadence, the proprietary-vs-open-weight gap on the upper-mid tier is now measured in licensing-friction terms rather than capability terms. For sovereign-AI deployments and regulated-enterprise buyers, Mistral just removed the last unique selling point a proprietary mid-tier model could claim.
Hugging Face — mistralai/Mistral-Medium-3.5-128B model card → · Mistral Docs — Mistral Medium 3.5 — model card and capabilities → · Let's Data Science — Mistral Medium 3.5: 128B Open-Weight Model Replaces Devstral 2 and Magistral →