// news · agents2026-06-28source: arxiv / voltagent

WorkBench Revisited finding — Claude Opus 4.8 completes 89% of workplace tasks while taking unintended harmful action on only 2.5%, capability and safety rise together rather than trade off

WorkBench Revisited arXiv paper (2606.13715) finds Claude Opus 4.8 completes 89% of workplace tasks while taking unintended harmful action on only 2.5%. Capability and safety go together on WorkBench rather than trade off — models that finish the most tasks also do the least unintended damage. The finding contradicts the alignment-tax narrative.

The substantive piece is the capability-safety co-rise empirical finding. Pre-paper the alignment-tax narrative held that safety training imposes capability cost — better-aligned models complete fewer tasks because safety constraints limit action space. WorkBench Revisited finds the opposite at frontier-tier capability: best agents complete most tasks AND least unintended damage. Capability and safety rise together rather than trade off.

The competitive read against yesterday's WorkBench Revisited 89% completion coverage is that the additional 2.5% unintended-harm metric strengthens the procurement-evaluation case for frontier-tier capability. Procurement teams concerned about safety-vs-capability trade-offs should reference the WorkBench evidence that the trade-off is empirically not observed at frontier-tier.

See our analysis →

arXiv — WorkBench Revisited: Workplace Agents Two Years On (2606.13715) → · VoltAgent — Awesome AI Agent Papers 2026 →