// blog · analysis · alignment2026-05-225 min read

Private-funded safety research overtakes federal — Anthropic Fellows, Glasswing data, and the postponed EO's collateral effect on AISI's authority

The pulled EO would have routed federal procurement-conditional funding into AISI methodology development. Without it, AISI's expansion stays voluntary. Anthropic's Fellows program is filling the gap — by Q3 2026, private-funded safety research will be meaningfully larger than government-funded safety research. That has implications nobody is fully reckoning with.

What's happening with safety research funding

Anthropic opened applications for the May and July 2026 Fellows program cohorts. Six-month residencies covering scalable oversight, adversarial robustness, AI control, model organisms, mechanistic interpretability, AI security, and model welfare. The expansion lands the same week the postponed EO leaves federal AISI funding ambiguous.

The asymmetry is structural

AISI's expansion through 2026 depended on the EO formalizing federal procurement-conditional funding. Without the EO, AISI's expansion stays voluntary — the same operating model that constrained it through 2024-2025. Anthropic's Fellows pipeline, meanwhile, is funded out of the $30B run-rate, scaling at industry-economic rates rather than government-budget rates.

By Q3 2026, the cumulative private safety research bench (Anthropic Fellows + Google DeepMind alignment + OpenAI safety + Meta INSPECT) will be substantially larger than the public bench (UK AISI + US AISI + EU AI Office combined). That's a structural shift the AI safety community has not fully reckoned with.

When private safety research is bigger than public safety research, the methodology that governs the frontier is set by labs, not by governments. The EO's postponement made this transition more likely, not less.

The methodology consolidation in private hands

The field-defining mechanistic interpretability review consolidates 2024-2026 methodology — circuits, features, sparse autoencoders, behavioral attribution — into a single reference text. The institutional consolidation around this methodology runs through the labs (Anthropic transformer-circuits work, OpenAI microscope, Meta INSPECT). AISI's role is increasingly to validate and standardize what the labs publish, not to lead the methodology development.

The Glasswing data feedback loop

Glasswing's interpretability data pool is starting to flow. Anthropic gets the largest controlled-deployment behavioral dataset ever assembled, sourced from AWS, JPMorgan, and the rest of the consortium. That dataset compounds the lab-side methodology lead — and AISI cannot replicate it, because AISI doesn't have access to the consortium-bound deployment data.

The DPO transition consolidates the lab advantage

DPO has supplanted RLHF as the default frontier alignment method. DPO-trained models are meaningfully easier to interpret than RLHF-trained predecessors. The methodology benefits accumulate in private labs first; by the time AISI publishes guidance, the labs are already deploying DPO-aligned models with mechanistic-interpretability tooling built around them.

What this means for the policy debate

If the softer EO eventually signs (probable, per today's analysis), the OSTP review board will face a structural problem: the methodology AISI is supposed to validate is being developed at industry-economic rates by labs the OSTP is supposed to review. The gap between the methodology pipeline and the validation pipeline keeps widening through 2026 H2.

The forward read

By Q4 2026, the implicit safety regime is: labs develop methodology under industry funding, lab-led consortia (Glasswing-shaped) accumulate the deployment data, lab-published papers consolidate the field. AISI publishes validation reports that lag the methodology by 6-12 months. The EO that eventually signs codifies this asymmetry rather than reversing it. The Mythos camp inside the administration may have lost more than they realized when the order was pulled.

Anthropic Alignment — Fellows 2026 → · Claude 5 Hub — AI safety progress → · Zylos — AI safety 2026 →