// news · alignment · safety2026-05-19source: anthropic

Anthropic's next-generation Constitutional Classifiers ship with 90% lower inference overhead

Anthropic published a follow-up to its Constitutional Classifiers paper, describing a next-generation implementation that achieves the same 4.4% jailbreak success rate at roughly 10% of the previous compute overhead — a key step toward making the technique deployable at the full scale of the Claude API.

The efficiency gains come from a combined-classifier architecture that fuses input screening and output screening into a single distilled model, rather than running two separate passes. The fused classifier loses very little accuracy versus the two-pass system.

The compute savings are what make this matter in production. At 10% overhead, the technique can run on every Claude API call without changing the cost structure. Expect the broader industry to adopt the architectural pattern — distilled fused classifiers are now the obvious go-to for safety screening at frontier scale.

Anthropic — next-gen Constitutional Classifiers → · Zylos 2026 safety survey →