// news · alignment2026-06-16source: metr / anthropic / mats

METR completes pilot misalignment-risk assessments of internal-developer AI agents at Anthropic, Google, Meta, and OpenAI — first cross-lab evaluation protocol for internal tooling

METR's pilot evaluated misalignment risk from AI agents used inside frontier labs (not externally shipped models) with participation from all four major US labs. It's the first systematic cross-lab framework for internal-tooling alignment risk — a category nobody was tracking 12 months ago.

The substantive piece is the internal-tooling evaluation gap. Alignment-evaluation frameworks through 2025 focused on externally-shipped frontier models — what gets deployed to customers, what shows up in API endpoints. Internal-developer agents (the AI tools labs use to accelerate their own research) operated outside the evaluation perimeter. METR's pilot closes this gap with a cross-lab protocol that explicitly evaluates internal-agent misalignment risk; the four-lab participation gives the protocol immediate cross-industry traction.

The connection to Anthropic's 2026 Risk Report formalization of automated-R&D risks is that the field is converging on internal-research-acceleration AI as a discrete risk category for the first time. Both moves arrive in the same week from different organizational origins (METR external + Anthropic internal); the cumulative signal is that the alignment community treats recursive-self-improvement scenarios as near-term operational concerns rather than theoretical 2027-2028 questions.

See our analysis →

METR — Model Evaluation and Threat Research → · Anthropic — Research → · MATS Program — Anthropic Research Stream →