Mistral Large 3 ships as 675B / 41B sparse MoE under Apache 2.0
Mistral Large 3 lands as a 675B-total / 41B-active sparse Mixture-of-Experts model under Apache 2.0 licensing. The architecture choice mirrors DeepSeek V4 and Llama 4 Maverick — the open-weight tier has converged on sparse MoE as the default frontier architecture.
The architectural convergence is the story. Three years ago, dense transformers were the open-weight default. Two years ago, MoE was DeepSeek's niche. Today, every frontier-class open-weight release ships as sparse MoE with at least one trillion total parameters. The active-parameter count (where Mistral lands at 41B) is the actual capability driver — but the total-parameter count is what the lab uses to claim the frontier-class designation.
For enterprises self-hosting, the math matters. 41B active means you can run inference on a single H100 with quantization, even though the total weights require a much larger deployment. The deployment story is more practical than the parameter count suggests.
HuggingFace — open-source LLMs 2026 → · Contabo — best open-source LLMs 2026 →