Google DeepMind, Microsoft, xAI sign onto US CAISI pre-deployment testing — 40+ TRAINS evaluations done
Google DeepMind, Microsoft, and xAI signed agreements in May 2026 joining OpenAI and Anthropic in providing frontier models to the US Center for AI Standards and Innovation (CAISI) for pre-deployment evaluation. The interagency TRAINS Taskforce has now completed more than 40 such evaluations, with biosecurity risk amplification and long-horizon agentic capabilities as the dominant test categories.
This is the operational answer to the "how do we test frontier models for catastrophic risk" question. The five-lab participant list — Anthropic, OpenAI, Google DeepMind, Microsoft, xAI — covers essentially the entire Western frontier-lab ecosystem. The only notable absences are Meta (which has internal evaluation) and the Chinese labs (which are outside US legal reach).
Biosecurity is the topic CAISI is most public about. The International AI Safety Report 2026 cites OpenAI's o3 outperforming 94% of domain experts at troubleshooting virology lab protocols — a capability that has dual-use risk and is hard to gate via downstream content filters.
CyberScoop — top AI companies safety work with governments → · OpenAI — joint safety evaluation findings →