// news · research-papers · interpretability2026-05-28source: anthropic / github / arxiv

Anthropic open-sources mechanistic interpretability circuit tracer — production tooling joins Gemma Scope 2 as part of the open methodology stack

Anthropic open-sourced its mechanistic interpretability circuit-tracer tooling this cycle, joining DeepMind's Gemma Scope 2 release as part of the open methodology stack. The circuit-tracer enables researchers to identify specific computational paths through model layers that connect input features to output behaviors, building on the sparse-autoencoder methodology that the broader field has converged on. The combined Anthropic-and-DeepMind open-source releases anchor the mech-interp infrastructure that the field will build on through 2026-2027.

The tooling-release scope is the substantive piece. The Anthropic circuit-tracer covers the methodology for identifying and visualizing computational paths through transformer layers, with specific support for tracing how identified SAE features at one layer combine and propagate to features at downstream layers. The tooling integrates with standard sparse-autoencoder-based feature identification — which means it works downstream of the feature-identification step that Gemma Scope 2's open-source toolkit supports. The combined research stack (Gemma Scope 2 for feature identification, Anthropic circuit-tracer for path-tracing) is the most complete open-source mech-interp methodology suite to date.

The methodology-trajectory implication is what makes the dual release broadly consequential. Through 2024-2025 mech-interp methodology was substantially developed inside frontier labs with limited open-source replication, which constrained academic-research participation. The 2026 dual-open-source release pattern — Anthropic's circuit-tracer plus DeepMind's Gemma Scope 2 — provides the academic research community with frontier-grade methodology tooling, which is the input that academic-research advances on. The Sonnet 4.5 safety case publication shows the deployment-side application of the same methodology family. The full stack — open-source toolkit, production-deployment artifact, methodology-publication path — is now legible enough that researchers entering the field have a clear trajectory.

See our analysis →

Anthropic — Open-source circuit tracer mech-interp tooling release → · GitHub — Anthropic interpretability tooling repository → · ArXiv — Circuit tracing in transformer interpretability 2026 →