// news · interpretability · open-source2026-05-26source: deepmind / anthropic / mdpi

DeepMind ships Gemma Scope 2 and Anthropic open-sources its circuit tracer — mechanistic interpretability infrastructure becomes accessible to independent researchers

DeepMind released Gemma Scope 2 this cycle — the next-generation suite of sparse autoencoders trained against the Gemma model family — and Anthropic open-sourced its circuit tracer used internally for cross-model alignment-property transfer. The combined infrastructure release is the moment mech-interp moves from lab-internal capability to public-domain methodology, with independent academic and industrial researchers able to do production-grade interpretability work without lab access.

Sparse autoencoders are the workhorse of modern mech-interp. They decompose a model's hidden-state activations into a sparse basis of human-interpretable features — "this neuron fires for code", "this neuron fires for European cities", and so on, but at the feature granularity that actually corresponds to the model's internal concepts. Gemma Scope 2 ships SAEs at multiple layers and widths across the Gemma 2 family, with documentation and example notebooks that lower the barrier to entry meaningfully. The first Gemma Scope release in 2024 was the proof-of-concept; Gemma Scope 2 is the production tooling.

The Anthropic circuit tracer is the higher-level capability. Where SAEs identify what the model represents, circuit tracing identifies how it computes — which features attend to which other features, which transformations compose into the model's higher-level behaviors. Anthropic's internal use has been demonstrated in alignment-property transfer (taking a circuit responsible for refusal behavior from one model and patching it into a different model). The open-source release means any researcher with access to model weights can now do equivalent work. Combined with Gemma Scope 2's SAE infrastructure, the public methodology stack now spans feature identification, circuit identification, and circuit modification — a complete experimental loop.

See our analysis →

AI Frontiers — Misguided Quest for Mechanistic AI Interpretability → · ArXiv — Unboxing the Black Box Mechanistic Interpretability → · Zylos Research — AI Safety Alignment Interpretability 2026 →