// topic / frontier-models

Frontier models

Releases, benchmarks, and the unspoken capability gaps between flagship systems.

All items 104 items ← back to archive

ANTHROPIC / YOURSTORY / AWS·2026-05-22

Anthropic's run-rate revenue climbs to $30B by April — $14B in Q1 disclosures becomes $30B annualized inside one quarter

Anthropic's CFO Krishna Rao disclosed in February that the company's run-rate revenue was $14B, growing 10x annually across three years. By April, that number had climbed to $30B annualized. Combined with the $380B Series G valuation and the $1.25B/month SpaceX compute commitment running through May 2029, the company's capital structure has shifted from 'frontier lab' to 'frontier lab with hyperscaler-scale infrastructure obligations'.

frontier-models · industry
ANTHROPIC / RED.ANTHROPIC.COM / INFOQ·2026-05-22

Anthropic holds Claude Mythos in lab and stands up Project Glasswing — $100M credits to AWS, Apple, Google, Microsoft, NVIDIA, JPMorgan, Linux Foundation

Anthropic confirmed Claude Mythos Preview will not be publicly released. Instead, the model is deployed through Project Glasswing — a consortium of AWS, Apple, Cisco, CrowdStrike, Google, JPMorgan Chase, Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. Anthropic is committing $100M in usage credits. Glasswing partners will use Mythos to identify and patch vulnerabilities in critical software before the model's capabilities reach adversarial hands.

frontier-models · safety · industry
DEEPSEEK / HUGGINGFACE·2026-05-22

DeepSeek V4-Flash holds 1M context under MIT — 284B/13B-active MoE proves the Flash-tier-open-frontier convergence

DeepSeek's V4-Flash variant (284B total / 13B active parameters, 1M context, MIT license) holds production-tier capability at hyperscaler-routable scale. Combined with V4-Pro (1.6T total / 49B active, 80.6 SWE-Bench Verified, 90.1 GPQA Diamond), DeepSeek now ships the most operationally credible open-weight Pro/Flash split. The 1M context retention in Flash is the structural detail that erases the case for routing to Pro on long-document workloads.

open-source · frontier-models
DEEPSEEK / HUGGINGFACE·2026-05-22

DeepSeek V4 Pro vs Flash — the procurement decision tree clarifies at MIT-licensed weights

DeepSeek's V4 release (April 24) shipped two SKUs: V4-Pro (1.6T total / 49B active parameters, 80.6 SWE-Bench Verified, 90.1 GPQA Diamond) and V4-Flash (284B total / 13B active, 1M context). Both run under the MIT license, both ship at 1M context, and both clear the bar for production deployment on coding and reasoning workloads. The Pro/Flash bifurcation now mirrors the closed-flagship pricing curve at a fraction of the cost.

open-source · frontier-models
GOOGLE / TECHCRUNCH·2026-05-22

Gemini 3.5 Flash becomes default in the Gemini app and AI Mode in Search — Google bets the next wave on agents, not chatbots

Google flipped Gemini 3.5 Flash to default across both the Gemini app and AI Mode in Search globally this week. The model outperforms 3.1 Pro on coding and agentic benchmarks while running 4× faster on output tokens per second. The default-tier flip is the operational signal Google has been telegraphing since I/O — the new product surface is agentic, and Flash is the price point Google wants users to inhabit.

frontier-models · agents
GOOGLE / BLOG.GOOGLE·2026-05-22

Gemini Spark runs on dedicated cloud VMs — the persistent personal agent moves from local extension to always-on cloud service

Google's Gemini Spark, the personal AI agent introduced at I/O, runs on dedicated virtual machines in Google Cloud and stays available 24/7 — even when the user's device is off. Spark is powered by Gemini 3.5 Flash via the full Antigravity pipeline, has cross-app access to the user's Gmail, Calendar, Drive, Photos, and YouTube history, and autonomously runs multi-step tasks on the user's behalf.

agents · frontier-models
ANTHROPIC / VENTUREBEAT / GITHUB·2026-05-22

Claude Opus 4.7 is now generally available across Bedrock, Vertex, and Copilot — Anthropic narrowly retakes the most-powerful-deployed-LLM crown

Anthropic's Claude Opus 4.7 (April 16 GA) is now broadly deployed across Amazon Bedrock, Google Vertex AI, and GitHub Copilot. Independent benchmarks place Opus 4.7 narrowly ahead of GPT-5.5 and Gemini 3 Pro on the hardest software-engineering tasks. The win is at the margin and the lead is reversible — but the procurement signal is that the closed-flagship tier has not yet flattened.

frontier-models
ALIBABA / QWEN·2026-05-22

Qwen 3.6-35B-A3B and Qwen 3.6-27B ship as open weights — Alibaba presses the cadence advantage with monthly drops

Alibaba's Qwen 3.6-35B-A3B (Apr 2026) and Qwen 3.6-27B (Apr 2026) continue the team's roughly-monthly drop cadence across 2026 H1. Combined with Qwen 3.5 (Feb 2026, 397B MoE with unified vision-language and 201 languages) and Qwen 3.6 Plus / Max Preview (Apr 2/20), Alibaba now ships the most operationally aggressive open-weights release schedule among Tier 1 labs.

open-source · models
ANTHROPIC / SPACEX / TECHCRUNCH·2026-05-21

Anthropic signs $1.25B/month compute deal with SpaceX — full Colossus capacity, $40B+ total contract through 2029

Anthropic and SpaceX announced a $1.25 billion per month compute partnership giving Anthropic full access to xAI's Colossus 1 data center in Memphis. The Memphis cluster delivers 300+ megawatts and houses 220,000+ NVIDIA H100/H200/GB200 GPUs. Anthropic ramps to 100% utilization within May 2026, with discounted pricing through June 2026 before full rates apply. SpaceX disclosed the contract in its IPO filing Wednesday.

frontier-models · compute · industry
CNBC / INDUSTRY ANALYSIS·2026-05-21

Chinese open-weight pricing pressure threatens the OpenAI and Anthropic IPO windows

CNBC's read of the Q2 prep work: Chinese models went from roughly 1% of OpenRouter usage in mid-2024 to more than 60% in May 2026, driven by a 5–20× price-per-token gap to closed flagships. That pressure is materially complicating the OpenAI and Anthropic IPO timelines because public-market investors are starting to discount the "closed lab moat" thesis that justified the private-round multiples.

frontier-models · industry
DEEP COGITO·2026-05-21

Deep Cogito v2 ships 70B/109B/405B/671B open-weight family with Iterated Distillation & Amplification self-improvement loop

Deep Cogito's v2 release ships four open-weight sizes (70B, 109B, 405B, 671B) wired into an Iterated Distillation & Amplification (IDA) self-improvement loop. The release positions IDA as a deployable architecture rather than a research curiosity — the first open-weight family where the "model improves itself between checkpoints" methodology is shipped as the default training recipe.

open-source · frontier-models · research
DEEPSEEK·2026-05-21

DeepSeek V4 Flash quietly extends 1M context to standard tier — Apache-2.0 weights match closed-flagship reasoning on Pass@1

DeepSeek extended the 1M context window to its V4 Flash tier this week, pushing the cheaper standard SKU into a capability bracket previously occupied only by V4 Pro and closed flagships. Combined with the unchanged 80.6% SWE-Bench Verified ceiling and the MIT/Apache-2.0 license, the practical effect is to compress the price-quality gradient on long-context production workloads.

open-source · frontier-models
GOOGLE / ANTIGRAVITY·2026-05-21

Gemini 3.5 Flash hits 76.2% Terminal-Bench 2.1 and 1656 GDPval Elo — frontier-class capability at Flash-tier price

Google's Gemini 3.5 Flash hit 76.2% on Terminal-Bench 2.1, 1656 Elo on GDPval-AA, and 83.6% on MCP Atlas at launch this week. The numbers put Flash within striking distance of full-Pro frontier models on coding and agentic benchmarks while shipping at Flash-tier pricing. It's the first explicit demonstration that 'Flash' no longer means 'small/cheap/limited' — it means 'frontier capability with latency-and-cost optimizations.'

multimodal · frontier-models
GOOGLE / DEEPMIND·2026-05-21

Gemini Omni Flash begins rolling out to AI Plus/Pro/Ultra subscribers — unified multimodal becomes generally consumed

Google began rolling out Gemini Omni Flash to AI Plus, Pro, and Ultra subscribers on May 19 via the Gemini app and Flow creative studio. The Flash tier of Google's unified multimodal model is the first time a single model that natively accepts text+image+audio+video in one prompt is being delivered as a consumer subscription product rather than a research preview.

frontier-models · multimodal
GOOGLE / CNBC·2026-05-21

Gemini Spark personal agent enters beta — Google launches 24/7 task-running agent across connected apps

Google launched Gemini Spark, a 24/7 personal AI agent that can reason across connected Google apps, into beta this week alongside Gemini 3.5 Flash. Initial availability is restricted to Google AI Ultra subscribers and a small trusted-tester cohort. Spark joins OpenAI's Operator and Anthropic's Claude Cowork in the same-week launch cadence — the personal-agent tier is now a saturated market.

agents · frontier-models
OPENAI·2026-05-21

OpenAI's general-purpose reasoning model autonomously disproves an 80-year-old Erdős conjecture in discrete geometry

OpenAI announced that one of its general-purpose reasoning models autonomously disproved a central conjecture in discrete geometry — the planar unit-distance problem posed by Paul Erdős in 1946. The model found a new family of point configurations beating the square-grid arrangement and produced a mathematical proof. A subsequent refinement by Princeton's Will Sawin showed δ ≥ 0.014 is achievable from the construction.

frontier-models · research · math
SPACEX / XAI / INDUSTRY·2026-05-21

SpaceX completes $250B acquisition of xAI — largest AI-related M&A in history, by a 4× margin

SpaceX has completed its $250B acquisition of xAI, eclipsing the combined value of all AI-related M&A activity over the previous three years. The deal consolidates Musk's AI, satellite, and launch infrastructure under one corporate roof and creates the only fully-vertically-integrated frontier-AI-plus-compute-plus-energy stack at hyperscale.

industry · frontier-models · m&a
SOURCE·2026-05-21

The two hours that changed AI — Erdős, Anthropic-SpaceX, and the day the frontier got bigger

On the morning of May 21, OpenAI announced that one of its general-purpose reasoning models had autonomously disproved an 80-year-old Erdős conjecture. Two hours later, Anthropic and SpaceX named a $1.25B/month compute deal. The day became Axios's 'two hours that changed AI.' Both stories matter — for different structural reasons.

analysis · frontier-models
ANTHROPIC / INDUSTRY·2026-05-20

Anthropic closes $30B Series G at $380B post-money — second-largest private VC deal in history

Anthropic reportedly raised $30 billion at a $380B post-money valuation in Series G — the second-largest private venture deal on record. The company is reporting $14B annualized revenue and is on track for the fastest revenue ramp from zero of any enterprise software company in history. The capital underwrites the disclose-hold-evaluate-ship posture on Mythos and the next compute build.

industry · frontier-models · funding
ANTHROPIC / AISI·2026-05-20

Claude Mythos Preview becomes first model to clear UK AISI 32-step capability range

Anthropic's Claude Mythos Preview is the first model on record to clear the UK AI Security Institute's 32-step "The Last Ones" (TLO) evaluation range, hitting 3 of 10 successful clears with a 73% success rate on expert-level subtasks. Mythos Preview also tops SWE-bench Verified at 93.9% — meaningfully ahead of GPT-5.5 (88.7%) and Opus 4.7 (87.6%).

frontier-models · safety · anthropic
DEEPSEEK / LLM-STATS·2026-05-20

DeepSeek V4 ships under MIT license — 1.6T Pro and 284B Flash, both at 1M context

DeepSeek released V4 (Pro at 1.6T total / 49B active, Flash at 284B total / 13B active) on April 24 under MIT licensing. Both variants ship with 1M token context. V4 Flash pricing of $0.14/M input is the floor for the open-weight frontier and is forcing competing labs to reprice or differentiate on capability.

open-source · model · china
GOOGLE / DEEPMIND·2026-05-20

Gemini 3.5 Flash goes GA — $1.50/$9 per 1M, 76.2% Terminal-Bench, beats Gemini 3.1 Pro on coding

Google made Gemini 3.5 Flash generally available — frontier-level intelligence at roughly 4× the speed of comparable models. Pricing lands at $1.50 input / $9 output per million tokens with a 1M context window. The Terminal-Bench 2.1 score of 76.2% has the Flash variant beating Gemini 3.1 Pro on coding and agentic workflows.

frontier-models · industry
GOOGLE / CNBC·2026-05-20

Google ships Gemini 3.5 Flash and Spark agent — finally a credible answer to ChatGPT and Claude

Google used the May 19-20 I/O keynote to ship Gemini 3.5 Flash (half-to-one-third the price of frontier peers, now default in the Gemini app and AI Mode search globally) plus Gemini Spark — a general-purpose agent that reasons across connected apps and takes action on the user's behalf. Spark is in beta for Google AI Ultra subscribers and trusted testers starting next week.

frontier-models · agents · google
META / MISTRAL / CODERSERA·2026-05-20

Meta Llama 4 and Mistral Medium 3.5 anchor the European-American open-weight tier

Meta shipped Llama 4 in April 2026 with Scout (17B active / 109B total MoE, runnable on 10GB VRAM) and Maverick (17B active / 400B total). Mistral Medium 3.5 launched April 29 — a 128B dense model hitting 77.6% on SWE-bench Verified, the best single-vendor coding stack outside the Anthropic and OpenAI labs.

open-source · model
SOURCE·2026-05-20

Why Anthropic is holding Mythos in preview after the TLO clear

Anthropic cleared the AISI's hardest benchmark and the first thing they did was not ship. That's the story. The TLO partial-clear is a capability disclosure event without a deployment event — and the gap between the two is now part of frontier-lab strategy.

analysis · frontier-models · safety
PRESS / LABS·2026-05-19

Four Chinese open-weights labs shipped frontier-class models in a 12-day window

Z.ai (GLM-5.1), MiniMax (M2.7), Moonshot (Kimi K2.6), and DeepSeek (V4) all landed in a 12-day window in early-to-mid May 2026 — all clearing 75%+ on SWE-bench Verified, all priced below $0.30/M input tokens, all permissively licensed for commercial use.

open-source · model · china
ANTHROPIC / LEADERBOARDS·2026-05-19

Claude Code holds 78.4% SWE-bench Verified lead over Codex, Cursor, Devin, Replit

Updated SWE-bench Verified leaderboards confirm Claude Code at 78.4% — meaningfully ahead of OpenAI Codex at 71.0%, Cursor agent at 67.2%, Devin at 60.8%, and Replit Agent 3 at 54.1%. The 7-point gap to second place is the widest single-agent lead the benchmark has seen.

frontier-models · agents · benchmark
AXIOS / SILICONANGLE·2026-05-19

Meta confirms open-source Avocado and Mango variants alongside closed flagships

Meta has confirmed it will release open-weights versions of its next two frontier models, codenamed Avocado and Mango, while keeping the largest variants proprietary — a hybrid strategy that splits the difference between Llama's open-source heritage and the closed-model economics of rival labs.

frontier-models · open-source · meta
ANTHROPIC / REUTERS·2026-05-15

Anthropic holds Mythos in lab over biosecurity risks

Anthropic disclosed that its most capable upcoming model — internally code-named Mythos — has been held back from any external API release after the company's safety evals flagged uplift potential in cyber and biosecurity domains.

frontier-models · safety · model
OPENAI / LLM-STATS·2026-05-05

OpenAI swaps the ChatGPT default to GPT-5.5 Instant

As of May 5, GPT-5.5 Instant is the model behind plain "GPT" in ChatGPT for free users, with GPT-5.5 (non-Instant) becoming the default for Plus and Pro tiers. The non-event quality of the rollout is itself the story.

models · frontier-models
SUBQUADRATIC / LLM-STATS·2026-05-05

SubQ 1M-Preview — first commercial subquadratic LLM, 12M token native context

Subquadratic's May 5 launch is the first generally-available large language model that drops standard transformer attention entirely. Claimed: ~5x lower cost than frontier transformers, up to 52x faster attention at scale, and a native 12 million token context window — not a sliding-window trick.

models · frontier-models · research
MISTRAL.AI·2026-04-29

Mistral Medium 3.5 lands — capstone on a six-week release blitz

Mistral Medium 3.5 (Apr 29) is a frontier multimodal model targeted at agentic and coding workloads. It's the headline at the end of a stretch where Mistral shipped Small 4 (unifying Magistral/Pixtral/Devstral), Voxtral TTS, Leanstral for formal proofs, and the Forge enterprise platform — all between March 16 and end of April.

open-source · models
BLOG.GOOGLE / DEEPMIND.GOOGLE·2026-04-02

Google Gemma 4 ships under Apache 2.0 — four sizes, MoE, multimodal, 256K context

Gemma 4 (April 2) arrives in E2B / E4B / 26B MoE / 31B Dense variants with native image+video everywhere and native audio on the smaller models. 256K context, 140+ languages, agentic-workflow-oriented. The 31B Dense reportedly hit #3 on Arena's text leaderboard.

open-source · models
ALIBABA QWEN / MARKTECHPOST·2026-03-30

Alibaba Qwen 3.5 Omni — native multimodal text/audio/video with sub-300ms TTFT

Qwen 3.5 Omni (released March 30) is a native multimodal model handling text, audio, video, and real-time interaction. Real-time audio time-to-first-token comes in below 300ms with 95%+ ASR accuracy — the relevant numbers for actual voice-assistant deployment.

multimodal · open-source · models