// news · interpretability · research2026-05-19source: arxiv 2509.03738 / iclr 2026

SAE Neural Operators paper accepted to ICLR 2026 — generalizing SAEs across model scales

Mechanistic Interpretability with Sparse Autoencoder Neural Operators (arXiv 2509.03738), accepted at ICLR 2026, generalizes the SAE methodology to operate as a neural operator that transfers learned dictionaries across models of different scales without retraining.

The technique solves a practical pain point: SAEs are expensive to train, and have historically had to be retrained separately for each base model. Neural-operator SAEs can be trained once on a small model and applied (with adaptation) to a larger model from the same family.

The transferability claim has been independently verified on the Llama and Qwen model families. Expect the technique to drop the cost of frontier-scale interpretability work by an order of magnitude, which would unblock several research directions previously bottlenecked on SAE compute.

arXiv 2509.03738 → · ICLR 2026 mech interp track →