// news · multimodal · open-source2026-06-25source: nvidia / pricepertoken

NVIDIA releases Nemotron 3 Nano Omni — open omni-modal 30B-parameter MoE unifies vision, audio, language, delivers 9x higher throughput than comparable open multimodal models, leads 6 accuracy leaderboards

NVIDIA released Nemotron 3 Nano Omni — an open omni-modal reasoning model that unifies vision, audio, and language capabilities into a single 30B-parameter mixture-of-experts architecture. The model delivers up to 9x higher throughput than comparable open multimodal models while topping six accuracy leaderboards for document intelligence, video, and audio understanding. Performance-per-throughput combination establishes a new open-multimodal reference.

The substantive piece is the 9x throughput advantage at competitive accuracy. Pre-Nemotron-3 open multimodal models operated at tradeoff frontier — higher accuracy at lower throughput, or higher throughput at reduced accuracy. The Nemotron 3 architecture demonstrates that both can scale together when MoE architecture is applied to omni-modal unification rather than separate vision/audio/language modules.

The competitive read for the H2 2026 multimodal landscape is that NVIDIA's open-weight contribution alongside Molmo 2's open video understanding represents substantial open-source multimodal capability accumulation. Microsoft Phi-4-reasoning-vision-15B adds an enterprise-efficiency option. Three major open multimodal releases in two weeks accelerate the open-multimodal category against closed-source vendor offerings.

See our analysis →

NVIDIA Newsroom — News Archive → · Price Per Token — New Models Today — AI & LLM Releases Last 24 Hours →