// news · compute · frontier-models2026-05-29source: nvidia / tom's hardware / servethehome

NVIDIA Rubin NVL72 promises 10x reduction in inference token cost versus Blackwell — 4x fewer GPUs to train MoE models, volume production H2 2026

NVIDIA unveiled the Rubin platform with six new chips and one AI supercomputer, promising up to 10x reduction in inference token cost and 4x reduction in the number of GPUs to train MoE models compared with the Blackwell platform. Volume production of Vera Rubin NVL72 systems ramps in the second half of 2026, with AWS, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure among the first cloud providers to deploy Vera Rubin-based instances.

The performance-economics shift is the substantive piece. The 10x reduction in inference token cost is the headline metric — meaning inference-economy applications (consumer AI assistants, agent runtimes, batch document processing, anywhere token volume drives compute spend) become 10x cheaper to operate per unit work on Rubin versus Blackwell. The 4x reduction in GPUs needed to train MoE models scales training-budget economics in a similar direction. The cumulative effect through the Vera Rubin ramp window is that compute-bound product categories that previously could not be built profitably become economically viable, expanding the deployable application surface meaningfully.

The competitive context is the custom-ASIC versus merchant-GPU dynamic. DeepSeek V4 Pro at 1.6T total parameters / 49B active under MIT license with 1M-token context is the open-weight-frontier-model load that justifies the Rubin-scale inference infrastructure. The Google TPU, AWS Trainium 2 and Inferentia 3, Microsoft Maia, and Meta internal silicon are the custom-ASIC alternatives competing for the hyperscaler workload. NVIDIA's Rubin economics threaten to compress the custom-ASIC value proposition by making merchant-GPU economics competitive again — the question through H2 2026 will be whether the announced 10x token-cost reduction holds at deployment scale, or whether real-world utilization gaps preserve the custom-ASIC procurement rationale.

See our analysis →

NVIDIA Investor Relations — Rubin Platform Six New Chips Press Release → · Tom's Hardware — Vera Rubin NVL72 5x inference 10x lower cost per token → · NVIDIA Developer Blog — Inside the Rubin Platform technical detail →