NVIDIA ships Nemotron 3 Nano Omni — 30B hybrid Mamba-Transformer MoE (3B active), multimodal for agents
Nemotron 3 Nano Omni (April 28) unifies vision, audio, language, and text into one open multimodal model. The architecture is the interesting bit: a hybrid Mamba-Transformer MoE with 30B parameters and only 3B activated per forward pass.
NVIDIA's positioning for the Nemotron 3 family is explicitly agentic — these are open models built for AI agent workloads rather than chat. Nano Omni is the small variant; Super and Ultra sizes are in the works for H1 2026.
The architectural detail worth tracking: hybrid Mamba-Transformer is no longer a research-paper curiosity. NVIDIA has now shipped a 30B production model on the architecture, with 3B activated parameters per forward pass (Mixture-of-Experts gating). For agentic workloads — long context, sustained inference, lots of tool calls — Mamba's linear scaling in sequence length matters more than transformer's quadratic attention.
The model is available via Hugging Face, OpenRouter, build.nvidia.com, and 25+ partner platforms. NVIDIA also announced the Nemotron Coalition — a group of global AI labs that will collaborate on the upcoming Nemotron 4 family.