// news · research · interpretability2026-05-15source: zylos research

Zylos Research publishes 2026 mech interp landscape survey

Zylos Research released a comprehensive survey of mechanistic interpretability progress through Q2 2026. Headline finding: sparse autoencoders are now reliably extracting interpretable circuits at the scale of frontier models, but downstream uses in alignment remain mostly speculative.

The survey catalogs ~340 papers and tools from the year. SAE-based feature extraction has matured to the point where a frontier model can be decomposed into hundreds of thousands of named features within hours of compute, down from weeks in early 2025.

The honest gap remains: extracting features is not the same as using them. Most production safety stacks still rely on output-level filtering, not circuit-level intervention. The Anthropic SAE-based pre-deployment gate (May 10) is one of the first counterexamples.

Zylos Research — interpretability 2026 →