Anthropic's 2026 Risk Report formalizes 'Risks from automated R&D' as distinct category — externally reviewed by METR ahead of expected capability inflection
The Anthropic 2026 Risk Report's automated-R&D section, externally reviewed by METR, treats AI-driven AI research as a discrete risk class for the first time at a major lab. It signals the field is preparing for the recursive-self-improvement scenario as a near-term operational concern, not a theoretical 2028 question.
The substantive piece is the risk-category formalization. Anthropic's prior risk reports (2024, 2025) treated AI-driven research as a sub-category of model-capability risk; the 2026 report's elevation to discrete category status with dedicated external review is a procedural signal that the lab considers automated R&D a load-bearing operational risk through H2 2026. METR's external review of this section establishes the cross-lab evaluation infrastructure needed if and when the risk materializes; the procedural alignment between Anthropic and METR matters for operational response capacity.
The structural read against METR's cross-lab internal-agent pilot is that the alignment field is operationalizing internal-tooling risk evaluation faster than capability inflection in that domain is arriving. The capacity-vs-capability race for internal-research AI is the highest-leverage alignment work of H2 2026; both METR and Anthropic are positioning to operate at the relevant timescale rather than playing catch-up after the capability shows up.
Anthropic — Research → · METR — Model Evaluation and Threat Research → · Wikipedia — AI Alignment →