// news · alignment · policy2026-06-03source: uk aisi / microsoft on the issues / resultsense

UK AISI flags eroding oversight — chain-of-thought monitoring breaks under latent reasoning, frontier cyber-offence now doubling every four months

The UK AI Security Institute published findings on May 22, 2026 that the technical premises of pre-deployment safety review are degrading. Chain-of-thought monitoring — the workhorse of current frontier-model safety evaluation — works only while reasoning happens in human-readable text, and AISI estimates that frontier cyber-offence capability is now doubling every four months, accelerated from a seven-month doubling rate at the end of 2025.

The latent-reasoning warning is the part safety teams have not yet absorbed. Current frontier-model red-teaming relies on the fact that models write their reasoning in English (or another natural language) before producing actions, and evaluators score the trace. Latent reasoning architectures — already in research at multiple frontier labs — move the computation entirely inside the model's internal state, with no human-readable intermediate. AISI's framing is direct: deploying such systems at scale would eliminate one of the strongest monitoring signals current safety methodology depends on, and there is no published evaluation replacement that scales to frontier capability.

The cyber-offence doubling number is the second load-bearing finding. AISI's estimate of a four-month doubling rate (down from seven months at close of 2025) means a model evaluated at T-30-days under any review regime is meaningfully behind the model that ships at T, and dramatically behind the model the same lab will ship a single review-cycle later. The compounding implication for any fixed-window pre-deployment review is structural — the methodology measures a moving target on the slower side of the curve.

The findings land in the same week as the June 2 Trump EO formalizing a 30-day voluntary review and shortly after CAISI signed pre-deployment MOUs with Google DeepMind, Microsoft, and xAI. The procedural regime is consolidating around a methodology AISI has just told the field is losing its evaluative power.

See our analysis →

UK AI Security Institute — AISI research and publications → · Microsoft On the Issues — Advancing AI evaluation with CAISI (US) and AISI (UK) → · ResultSense — AISI: AI oversight will erode as models advance →