// news · open-source2026-06-20source: featherless / aiunpacking / precisionai

Llama 4 Scout fits on a single Nvidia H100 with industry-leading 10M token context window — Meta resets the open-source long-context bar

Meta's Llama 4 Scout combines a 10M-token context window with single-H100 inference footprint — the largest context window publicly available in any model class, open or closed. The Scout release outperforms Gemma 3 and Mistral 3.1 across multimodal tasks while keeping the single-GPU deployment story intact.

The substantive piece is the resource-vs-context Pareto improvement. Through 2025 the long-context-vs-deployment-cost tradeoff was strict: bigger context window meant more memory, more GPUs, higher per-token serving cost. Llama 4 Scout breaks that tradeoff for the open-source category — 10M tokens of context with single-H100 inference is a Pareto improvement that has no open peer. The deployment economics matter: hospitals, law firms, code-review teams, and any other domain with long-document workloads can self-host a 10M-context model on a single GPU instead of building multi-GPU inference farms.

The competitive read against the closed-source side is that Llama 4 Scout's 10M context exceeds Gemini's 2M-class window and the rumored GPT-5.6 1.5M window. Long-context is no longer a closed-source moat; open-source overtook on this axis in Q2 2026. The vendor-evaluation question for long-context workloads is now 'what retrieval quality at length, what cost per long-context query' — and Llama 4 Scout sets the open-source cost baseline.

See our analysis →

Featherless — Best Open-Source LLMs in 2026 → · AI Unpacking — Open Source AI Models 2026: Llama, Mistral, DeepSeek & The Complete Guide → · Precision AI Academy — Open Source AI Models 2026: Llama, Mistral, Gemma, DeepSeek →