// news · alignment2026-06-22source: ncbi / oxford

'Helpful, harmless, honest? Sociotechnical limits of AI alignment through RLHF' paper provides multidisciplinary critique of RLHF — significant limitations in capturing the complexity of human ethics

The Umeå University / Vrije Universiteit Amsterdam / Delft sociotechnical-limits paper provides a multidisciplinary critique examining RLHF's theoretical underpinnings and practical implementations. The conclusion: significant limitations in RLHF's approach to capturing the complexities of human ethics, compounding the shared-failures-among-alignment-techniques concern.

The substantive piece is the sociotechnical critique landing alongside the formal-methods turn. The formal-methods alignment paper argues that bounded-halting models support formal verification. The sociotechnical-limits paper argues that empirical methods like RLHF face foundational limitations regardless of capability scaling. The two papers together suggest a bifurcation: formal methods for narrow-and-bounded domains, sociotechnical-aware approaches for general-purpose alignment.

The procurement implication is that alignment-stack design should match the deployment domain shape. The shared-failure-modes analysis and this sociotechnical critique together identify which alignment-stack combinations have correlated vs independent failure surfaces. The H2 2026 alignment-architecture decisions should weight the bifurcation toward formal methods for safety-critical narrow domains while developing better empirical methods for general-capability frontier models.

See our analysis →

NCBI/PubMed — Helpful, harmless, honest? Sociotechnical limits of AI alignment → · Oxford AIGI — Legal Alignment for Safe and Ethical AI →