// news · open-source2026-06-25source: pricepertoken / llm-stats

Microsoft announces Phi-4-reasoning-vision-15B — open-weight multimodal model balances high-level reasoning with computational efficiency, dynamic resolution vision encoders + mixed training approach

Microsoft announced Phi-4-reasoning-vision-15B — a 15 billion-parameter open-weight multimodal model designed to balance high-level reasoning in math and science with computational efficiency. The model uses dynamic resolution vision encoders and a mixed training approach to optimize for both reasoning-heavy and perception-focused tasks. The reasoning-plus-efficiency combination addresses procurement-economics constraints that larger multimodal models don't.

The substantive piece is the reasoning-plus-efficiency dual optimization. Pre-Phi-4-vision multimodal models typically optimized for either reasoning quality (large parameter counts, high computational cost) or efficiency (small parameter counts, limited reasoning depth). The dual-optimization architecture targets the procurement-economics tradeoff that larger reasoning-multimodal models impose. 15B parameter count enables substantially lower-cost deployment than 70B+ alternatives at competitive reasoning capability.

The competitive read for the H2 2026 open multimodal procurement landscape is that NVIDIA Nemotron 3 Nano Omni's 30B-MoE 9x-throughput architecture plus Phi-4-reasoning-vision-15B plus Molmo 2's open video understanding represent three distinct open-multimodal capability shapes. Procurement teams have multi-vendor options matching different deployment-economics requirements.

See our analysis →

Price Per Token — New Models Today — AI & LLM Releases Last 24 Hours → · LLM Stats — AI Updates Today (June 2026) →