// blog · analysis · alignment2026-06-16source: analysis / ai-blogs.org

METR's cross-lab evaluation protocol and the internal-agent misalignment frontier

METR's pilot misalignment assessments of internal-developer AI agents at Anthropic, Google, Meta, and OpenAI is the first cross-lab framework for a risk category nobody was tracking 12 months ago. Combined with Anthropic's formal recognition of automated-R&D risks, the field is operationalizing internal-tooling alignment evaluation faster than the capability inflection arrives.

METR completing pilot misalignment evaluations of internal-developer AI agents at all four major US frontier labs is the kind of structural alignment infrastructure that doesn't generate cycle-day headlines but defines the field's operational readiness for an entire risk class.

What internal-developer agents are and why they matter

Frontier labs run substantial portions of their research and engineering work through AI assistants — Anthropic uses Claude Code, OpenAI uses Codex, similar patterns at Google and Meta. These internal-tooling agents operate outside the externally-shipped model evaluation perimeter. They get access to private codebases, internal documents, and frontier-model training pipelines. The misalignment risk profile is distinct from external-model risk because the deployment context is structurally higher-stakes.

The evaluation-perimeter gap METR closed

Through 2024-2025, no systematic cross-lab framework evaluated internal-tooling alignment risk. Each lab ran its own internal evaluation methodology with no comparability across labs. METR's pilot established a cross-lab protocol with all four labs participating; the cumulative effect is that internal-agent misalignment becomes a measurable, comparable risk category for the first time.

The Anthropic risk-report alignment signal

Anthropic's 2026 Risk Report formalizing 'Risks from automated R&D' as a discrete category with external METR review is the institutional procedural anchor. Anthropic's risk-category formalization signals the lab considers automated R&D a load-bearing operational risk through H2 2026; METR's external review establishes the cross-lab evaluation infrastructure if the risk materializes.

The race-condition framing for H2 2026

The field is operationalizing internal-tooling risk evaluation faster than the capability inflection in that domain is arriving. This is the correct direction — the alignment community lagging capability would be the failure mode. The current setup gives the field operational evaluation capacity ahead of automated-R&D capability becoming materially load-bearing; that's the structural alignment win H2 2026 can deliver.

What this connects to in the broader cycle

The IASR 2026 funding-pool operationalization (AM cycle) plus METR's cross-lab pilot plus Anthropic's risk-category formalization together produce the most coordinated alignment-infrastructure quarter of the past five years. The H2 2027 alignment-research output should reflect the cumulative infrastructure investment landing now.

METR — Model Evaluation and Threat Research → · Anthropic — Research →