// news · research · alignment2026-05-18source: openai / deepmind / anthropic

Multi-dimensional human feedback is supplanting thumbs-up/down across major labs

OpenAI, DeepMind, and Anthropic have all published versions of multi-dimensional RLHF in 2026 — where annotators score helpfulness, harmlessness, honesty, and task-specific quality separately rather than as a single preference signal.

The convergence across three labs in one quarter is the signal. Each describes the technique slightly differently — Anthropic calls it "structured-reward fine-tuning," OpenAI calls it "factor decomposition," DeepMind calls it "multi-criterion preference" — but the mechanism is the same.

The motivation: scalar preference signals compress real information about why a response was preferred, and that compression is exactly where reward hacking lives. Multi-dimensional signals are harder to game without legibly degrading on the dimension being gamed.

Claude5 Hub — Constitutional AI + RLHF → · Claude5 Hub — alignment breakthroughs 2026 →