// news · interpretability · open-source2026-05-28source: eleutherai / arxiv / github

EleutherAI releases SAE-Bench 2 — public benchmarking suite for sparse-autoencoder interpretability adds Claude Opus 4.7, GPT-5.2, and Gemini 3.1 Ultra evaluation tasks

EleutherAI released SAE-Bench 2 on May 28 — a public benchmarking suite for sparse-autoencoder interpretability methodology, extended with evaluation tasks for Claude Opus 4.7, GPT-5.2, and Gemini 3.1 Ultra. The benchmark suite extends the open interpretability-research community's reproducibility surface, and the timing with the major frontier-lab releases the same day means SAE-Bench 2 will be the standard comparison surface for the new model generation.

The benchmark extension is the substantive piece. SAE-Bench 1 — released in late 2024 — established a reproducible evaluation surface for sparse-autoencoder methodology on the Pythia model series and the early Gemma generation. SAE-Bench 2 extends the suite with evaluation tasks calibrated for the larger and more capable frontier models, including the Claude Opus 4.7, GPT-5.2, and Gemini 3.1 Ultra releases from the same day. The benchmark covers four major axes: feature-discovery completeness (how many discoverable features the SAE methodology recovers), feature-interpretation accuracy (how reliably the discovered features map to human-legible concepts), feature-intervention specificity (how cleanly feature-steering interventions affect target behaviors without spillover), and methodology-comparison robustness (how consistent findings are across SAE training configurations).

The community-procedural context is the open-interpretability infrastructure buildout. Anthropic's day-zero circuit-tracer release means external researchers can use SAE-Bench 2 against Opus 4.7 immediately; DeepMind's Gemma Scope 2 from earlier in the month is the parallel toolkit on the open-weight side. EleutherAI's release adds the benchmarking layer that connects the various toolkits to a shared comparison surface. For the interpretability research community, the combined infrastructure means the methodological convergence across labs is now measurable in a procedurally consistent way — which is the prerequisite for the field to evaluate progress against shared baselines.

See our analysis →

EleutherAI — SAE-Bench 2 public release May 28 2026 → · arXiv — SAE-Bench 2 sparse autoencoder benchmarking methodology → · GitHub — SAE-Bench 2 open-source release frontier model evaluations →