"Attention as Binding" paper formalizes transformer reasoning as approximate Vector Symbolic Architecture
A new arXiv paper, "Attention as Binding: A Vector-Symbolic Perspective on Transformer Reasoning," interprets self-attention and residual streams as implementing an approximate Vector Symbolic Architecture (VSA). The framing provides a unified theoretical account for why transformers can do compositional reasoning — and predicts where they should fail.
VSA is a long-standing approach to symbolic computation in vector spaces, dating back to Plate's holographic representations in the 1990s and Kanerva's hyperdimensional computing. The claim that transformer attention is approximately implementing VSA bindings explains a number of empirical observations (in-context learning, compositional generalization patterns, failure modes on systematic out-of-distribution generalization) under a single framework.
If the VSA framing holds, it predicts that adding structured binding operations directly to transformer architectures should improve compositional generalization. That hypothesis is testable and falsifiable, which is more than most current theories of transformer reasoning offer.
arXiv — Attention as Binding paper → · LearnMechInterp — transcoders →