Anthropic and OpenAI publish joint cross-red-team — each ran the other's safety evals on the other's models
Anthropic and OpenAI completed a joint summer evaluation exercise in which each lab ran its internal safety and misalignment evaluations on the other lab's publicly released models. The published findings detail methodology differences and the categories where each company's tests flagged behaviors the other's didn't catch.
This is the most concrete cross-lab safety collaboration to date. The headline finding: methodology matters. Tests that one company built around its specific model architecture missed failures that another company's methodology would have caught — and vice versa.
The structural implication is that safety evaluation cannot be a per-company internal discipline. Either evaluations get standardized across labs (and CAISI / AISI become the natural standards body), or each new frontier model needs to be evaluated by multiple parties using divergent methodologies to catch the gaps any one approach misses.
OpenAI — joint safety evaluation findings → · VentureBeat — Anthropic vs OpenAI red teaming →