// news · frontier-models · alignment2026-05-26source: anthropic / uk aisi / llm-stats

Anthropic Claude Mythos Preview is the first model to clear UK AISI's 32-step Last Ones range — 3 of 10 runs successful at 73% expert-task accuracy

Anthropic's Claude Mythos Preview is the first model the UK AI Safety Institute has scored as clearing its 32-step Last Ones evaluation range — succeeding on 3 of 10 attempted runs at 73% accuracy across the constituent expert tasks. The Last Ones range is AISI's hardest single benchmark, designed to require sustained long-horizon reasoning across multi-domain expert workloads. Clearing it is the first time a frontier model has crossed that bar.

The Last Ones range matters because it's the benchmark AISI built specifically to resist saturation. Each task is a 32-step expert-domain problem that requires the model to maintain coherent reasoning across the full sequence, with no single step being trivial. The 30% success rate on Mythos Preview is below what frontier models typically achieve on shorter-horizon evals — but it's the first non-zero result on Last Ones, and the 73% per-step expert-task accuracy indicates the model is operating at expert level within the steps it does execute correctly. The failure mode is sustained-reasoning coherence, not domain knowledge.

The capability arrives in the same week as Anthropic's $30B second raise of 2026. The funding round priced in Claude 5 series execution; Mythos Preview is the first publicly-disclosed evidence that the next-generation capability story is on track. Whether Mythos becomes the production Claude 5 release or a research milestone toward it remains to be confirmed — but the AISI score is the kind of third-party validation that justifies the $900B valuation more concretely than internal benchmarks would. Expect competitor labs to be tested against the same Last Ones range in coming weeks.

See our analysis →

LLM Stats — AI Updates Today May 2026 → · WhatLLM — New AI Models May 2026 Frontier Took a Breath → · Future AGI Substack — Best LLMs in May 2026 →