NVIDIA Nemotron 3 Super — 120B hybrid MoE (12B active) tuned for local agent deployment
NVIDIA's open Nemotron 3 Super lands as a 120B-parameter hybrid MoE with 12B active and a 1M-token context window. The explicit design target: local agent deployment with tool-augmented coding workloads.
The "Super" tier sits above Nano in the Nemotron 3 family and below the closed Nemotron 4 Coalition models. The activation ratio (12B out of 120B per token) is the deployability story — at 12B active you can serve from a single workstation-class GPU, while the total parameter count buys capacity that dense 12B models can't match.
Why this shape: agentic coding workloads aren't a single forward pass; they're long tool-augmented sequences with frequent retrieval of routine knowledge interspersed with rare specialist calls. MoE activates the right specialists per-token. NVIDIA is positioning Nemotron 3 Super as the open-weights answer to that workload — and as a way to keep developers on NVIDIA silicon for local inference.