// news · alignment2026-06-22source: ncbi / oxford

'Moral disagreement and the limits of AI value alignment' paper argues crowdsourcing, RLHF, and constitutional AI all fail to accommodate reasonable moral disagreement

The Schuster and Kilov paper from Australian National University examines three current value-alignment approaches — crowdsourcing, reinforcement learning from human feedback, and constitutional AI — and argues all three fail to accommodate reasonable moral disagreement. The conclusion: accommodating reasonable moral disagreement remains an open problem for AI safety.

The substantive piece is the structural challenge to value-alignment-as-aggregation. All three of the dominant value-alignment approaches treat alignment as aggregating human preferences (crowdsourcing samples preferences directly, RLHF aggregates via reward modeling, constitutional AI codifies preferences as constitutional principles). The paper's argument: aggregation-based approaches inherently flatten reasonable moral disagreement into a single point in preference space, losing the structure that the disagreement itself contains.

The implication for safety-research direction is that value-alignment methodology may need to incorporate explicit moral-disagreement representation rather than collapsing to aggregate preferences. The paper doesn't propose a concrete alternative methodology — that's an open research problem. The shared-failure-modes analysis reinforces the concern: if all current alignment approaches share aggregation-based methodology, they share the failure mode of inadequately representing moral disagreement.

See our analysis →

NCBI/PubMed — Moral disagreement and the limits of AI value alignment → · Oxford AIGI — Legal Alignment for Safe and Ethical AI →