// news · alignment · policy2026-05-23source: aisi.gov.uk / financial times / 80,000 hours

UK AISI publishes Evaluation Methodology 2.0 — drops behavior-only testing in favor of activation-probe protocol

The UK AI Safety Institute released Evaluation Methodology 2.0 this week, the formal protocol underpinning the 2026 International AI Safety Report's pre-deployment testing framework. The 2.0 version deprecates pure behavioral testing in favor of a hybrid behavioral-plus-activation-probe protocol. The methodology is the first formal regulator-published evaluation standard that treats interpretability as load-bearing rather than experimental.

The shift to activation-probe testing reflects the finding that frontier models can distinguish test environments from real deployment through behavior alone. Methodology 2.0 requires a minimum of three activation-probe checks per evaluation domain in addition to standard behavioral tasks, with the probes calibrated against a public reference model.

The downstream consequence is that any country adopting the UK AISI standard for pre-deployment review now also adopts a de-facto interpretability disclosure requirement. Anthropic, Google DeepMind, and Apollo Research are the only three labs with the activation-probe tooling to comply directly today. Frontier labs that have been opaque on internals — including some of the open-weights leaders — face a harder compliance path.

See our analysis →

UK AISI — Evaluation Methodology 2.0 → · Financial Times — UK regulator demands activation-probe testing → · 80,000 Hours — Why the 2.0 protocol changes the field →