SubQ ships the first commercial subquadratic LLM — 12M-token context at roughly one-fifth the cost of quadratic-attention frontier models
SubQ shipped the first production-grade commercial subquadratic LLM in early May, with a 12M-token context window at approximately one-fifth the per-token cost of quadratic-attention frontier models. The release is the most consequential architectural release of May 2026, and it lands at the moment when long-context use cases are blocking on cost-economics rather than capability.
The subquadratic-attention bet has been a research direction for three years; SubQ is the first lab to ship a production model that delivers the promised cost savings without sacrificing capability on standard benchmarks. The 12M-token context is the largest deployed window outside research-only releases, and the per-token cost ratio is what changes the deployment economics for document-search, codebase-wide reasoning, and long-horizon agent workflows.
The strategic consequence is that long-context applications that were uneconomic on quadratic-attention frontier models become tractable. Codebases at the scale of a large monorepo (~10M tokens) fit in a single context. Multi-document legal review at firm-wide scale becomes priced for production deployment. The subquadratic frontier opens up workload categories that the quadratic frontier had structurally locked out.
WhatLLM — New AI Models May 2026: The Frontier Took a Breath, Architecture Took the Stage → · LLM-Stats — AI Updates Today May 2026 Latest AI Model Releases → · Future AGI — Best LLMs in May 2026, What Actually Matters in Production →