// news · research-papers · alignment2026-05-28source: anthropic / alignment.anthropic.com / arxiv

Anthropic publishes formal cybersecurity capability evaluation for Mythos — methodology document accompanies restricted-release decision

Anthropic published a formal evaluation methodology document for the Mythos cybersecurity capability assessment that produced the restricted-release decision. The publication includes the threat models the evaluation tested against, the capability-elicitation methodology used in the assessment, the quantitative findings on specific cyber-capability sub-tasks, and the rationale for restricting access. The document is the most detailed published cybersecurity-capability-evaluation methodology any frontier lab has released.

The methodology-publication scope is the substantive piece. The published document covers four major sections: the threat-model framework defining the cyber-capability axes evaluated (vulnerability discovery, exploit development, lateral-movement-and-privilege-escalation, defense-evasion, plus the meta-level capability of integrating these into multi-step operations); the elicitation methodology used to assess the model's capabilities (red-team probing, structured task-based evaluations, capability-amplification probes with tool access and uplift conditions); the quantitative findings on each capability sub-task; and the decision rationale connecting the findings to the restricted-release outcome. The combined methodology is reusable — other labs evaluating cybersecurity capability in their own models can reference the framework directly.

The regulatory-and-precedent consequence is what makes the publication broadly consequential. The Mythos restricted-release decision is the first time a frontier lab has held back a model publicly for security reasons; the methodology document is the procedural artifact that makes the restriction defensible against external review. The Sonnet 4.5 safety case publication on the interpretability side establishes the deployment-artifact precedent at the other end of the deployment-control spectrum. Combined, the two publications define the auditable-artifact-frontier for both intervention-driven deployment and capability-driven restriction — meaningful procedural infrastructure for the pre-deployment evaluation regimes that the transatlantic-coordination talks are working to harmonize.

See our analysis →

Anthropic Alignment — Mythos cybersecurity capability evaluation methodology → · Anthropic — Frontier capability assessment and restricted release → · ArXiv — Cybersecurity capability evaluation methodology 2026 →