Microsoft Phi-4 family expands: -mini, -multimodal, -reasoning, -reasoning-vision
Microsoft's small-language-model bet now includes Phi-4-mini, Phi-4-multimodal (text+audio+vision in one), Phi-4-reasoning, Phi-4-reasoning-plus, Phi-4-mini-reasoning, and Phi-4-reasoning-vision. Reportedly beats DeepSeek-R1-Distill-Llama-70B at most benchmarks despite far smaller size.
Phi-4-mini ships with a 200K-word vocabulary for increased multilingual support, grouped-query attention, built-in function calling, improved instruction-following, and shared embeddings. Phi-4-multimodal is the first in the family to support text, audio, and vision inputs natively — speech recognition, translation, summarization, OCR, chart/table interpretation, multi-image comparison, all in one model.
The reasoning variants are the most pointed claim. Microsoft reports Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning achieving better performance than OpenAI o1-mini and DeepSeek-R1-Distill-Llama-70B at most benchmarks, and better than the full DeepSeek-R1 (671B) on AIME 2025. If that survives independent eval, the small-reasoning-model thesis Microsoft has been pushing for two years is more right than the field has been pricing in.
There is no Phi-5 as of May 2026; the Phi-4 line keeps proliferating instead.