// news · open-source · frontier-models2026-06-17source: nvidia / llm-stats / promptquorum

NVIDIA Nemotron Cascade 2 ships at 30B parameters with 54 tokens/sec on consumer GPUs — the open-frontier moves into prosumer-hardware deployability

NVIDIA's Nemotron Cascade 2 lands at 30B parameters running at ~54 tokens/sec on consumer-tier GPUs, with the open-weights release matching closed-API tier capability at deployable hardware sizes. Following Nemotron 3 Ultra (550B, yesterday-PM) plus DeepSeek V4-Pro and MiniMax M3, this is the fourth credible open-frontier model of June and the most consumer-deployable.

The substantive piece is the consumer-tier-deployability inflection. Open-weight frontier-class models through Q1 2026 typically required 8x H100 or larger inference clusters; Cascade 2's 30B size at 54 tok/s on consumer GPUs (5090, MI300) means individual-developer-tier deployment becomes viable for a frontier-capability model. The H2 2026 deployment pattern shifts: not all production inference workloads need data-center-class hardware, and on-prem or single-developer hosting becomes a defensible procurement option for the first time.

The competitive frame against the cumulative June open-frontier release wave (Nemotron 3 Ultra + DeepSeek V4-Pro + MiniMax M3 + Cascade 2) is that the four-vendor open-frontier landscape has stabilized in a single month. Procurement teams evaluating H2 2026 deployment options now have a structural choice: closed-API frontier-tier (premium pricing, fast iteration) vs open-weight frontier-tier (capital cost, deployment control, no API dependency). The relative deployment economics are now competitive at every capability tier.

See our analysis →

LLM Stats — AI Updates Today (June 2026) → · Prompt Quorum — Best Ollama Models 2026: Top 10 Open Source LLMs → · AI Magicx — Local AI in 2026: The Best Models to Run on Your Own Hardware →