// news · frontier-models · safety · anthropic2026-05-20source: anthropic / aisi

Claude Mythos Preview becomes first model to clear UK AISI 32-step capability range

Anthropic's Claude Mythos Preview is the first model on record to clear the UK AI Security Institute's 32-step "The Last Ones" (TLO) evaluation range, hitting 3 of 10 successful clears with a 73% success rate on expert-level subtasks. Mythos Preview also tops SWE-bench Verified at 93.9% — meaningfully ahead of GPT-5.5 (88.7%) and Opus 4.7 (87.6%).

TLO is the AISI-designed adversarial benchmark for long-horizon agentic tasks — the kind that previous frontier models bounced off completely. A 30% full-task clear rate is small in absolute terms but is the first non-zero result, which makes it qualitatively different from "almost cleared."

The combination — TLO clear plus 93.9% SWE-bench — explains why Anthropic is keeping Mythos in preview rather than launching at scale. See our analysis → on Anthropic's post-capability disclosure posture. Expect Mythos to ship publicly only after the company finishes its companion red-team and CAISI evaluation cycle.

Air Street State of AI May 2026 → · WhatLLM new models May 2026 →