State Stream Transformer surfaces emergent metacognitive behaviors via latent state persistence
A January 2026 arxiv paper introduces the State Stream Transformer (SST) architecture — a transformer variant that persists latent state across inference calls. The paper claims emergent metacognitive-like higher-order processing: the model can reason about its own previous reasoning in a way standard transformers cannot.
The metacognition claim is bold and likely to be contested. The paper backs it with behavior-level evidence — SST handles tasks requiring "thinking about thinking" (multi-step plan revision, self-correction, uncertainty estimation about prior outputs) at materially higher accuracy than baseline transformers of the same parameter count.
If the result replicates, it changes the architecture conversation. Most current models scale by adding parameters and tokens; SST suggests another axis — persistent state across calls — that may be cheaper than scaling and yields qualitatively new behaviors.
arXiv — State Stream Transformer → · arXiv — Out-of-Distribution Generalization via Recursive Latent Space →