// news · multimodal · enterprise2026-06-03source: microsoft.ai

Microsoft Ships MAI-Image-2.5, an In-House Image Model With Editing Built In

Microsoft unveiled MAI-Image-2.5 at Build 2026 on June 2, a single model that handles both text-to-image generation and instruction-based photo editing. It debuted at No. 2 for editing and No. 3 for generation on the Arena leaderboard, and is already live in PowerPoint with a OneDrive rollout underway. The release is the clearest signal yet that Microsoft is building a parallel image stack to OpenAI's.

Microsoft's MAI team used the opening day of Build 2026 to ship MAI-Image-2.5, a multimodal image model that accepts both text prompts and uploaded reference images. The change matters because previous MAI image releases were generation-only — users described what they wanted and got a fresh image back. The 2.5 release folds editing into the same model, letting users hand it an existing photo plus an instruction ("replace the chair, keep the lighting") and get a modified version with identity and scene context preserved. Microsoft says the model debuted at No. 2 on Arena's image-editing leaderboard and No. 3 on text-to-image, with a 1,254 score that lands ahead of the previous MAI-Image-2 by 75 points and clears Google's Nano Banana Pro on editing.

The strategic frame is what TestingCatalog and IBTimes both lead with: this is Mustafa Suleyman's group weaning Microsoft off OpenAI dependency following the April 2026 contract renegotiation that ended Microsoft's exclusive OpenAI license. MAI-Image-2.5 ships alongside MAI-Voice-2, MAI-Transcribe-1.5, MAI-Code-1 Flash, and MAI-Thinking-1 — seven in-house models in total, all routed through Azure AI Foundry and OpenRouter. The pricing tells the same story: a flagship tier at $5 to $47 per million tokens and a Flash tier at $1.75 to $19.50, deliberately undercutting the GPT-Image and Gemini Image rate cards that Microsoft has been paying for inside Copilot and Designer.

The interesting technical detail isn't the Arena score — those leaderboards churn weekly and a 1,254 will be eclipsed by the next Google or OpenAI drop. It's that Microsoft is collapsing generation and editing into one weight set rather than maintaining separate models. That matters for PowerPoint and OneDrive, where the realistic workflow is "make me a slide image" followed by "now move the logo and recolor the background" — two calls that previously required hitting different models with different prompt conventions. A unified model means one API surface, one billing line, and one set of safety filters. For enterprise rollouts, that simplification is worth more than five points on a benchmark. We covered the broader push toward unified multimodal stacks in our analysis of foundation-model consolidation, and this fits the pattern.

The caveat is the same one that applies to every Build-week announcement: Arena rankings measure aesthetic preference on a thin slice of prompts, not production reliability. Nano Banana Pro 2K, the Google model MAI-Image-2.5 is being measured against, has weeks of public testing behind it; MAI-Image-2.5 has hours. The honest read is that Microsoft now has a credible house image model with real product distribution (PowerPoint, OneDrive, Copilot) — not that it has overtaken Google or OpenAI on image quality. The win for Microsoft is optionality. Every Copilot image request that resolves on MAI-Image-2.5 instead of GPT-Image is one fewer dollar flowing to OpenAI, and after April's renegotiation that math is the entire point.

Microsoft AI — Introducing MAI-Image-2.5 → · TestingCatalog — Microsoft readies new MAI voice and image models for Build 2026 → · IBTimes — Microsoft Reveals New AI Models Aimed At Relying Less on OpenAI →