'The 2025 AI Agent Index' arXiv 2602.17753 presented at FAccT '26 — comprehensive multi-dimension evaluation framework covering capability, safety properties, and security incidents
The 2025 AI Agent Index arXiv paper (2602.17753), presented at FAccT '26 (June 25-28), introduces a comprehensive multi-dimension evaluation framework — covering AI agents across capability, safety properties, and security incident history. The systematic framework provides procurement-evaluation infrastructure that single-dimension benchmarks don't support.
The substantive piece is the multi-dimension evaluation framework with security-incident tracking dimension. Pre-Index agent evaluation methodology focused on capability dimensions; security-incident tracking was distributed across vendor-specific disclosures and academic analyses. The Index integrates security-incident history into systematic evaluation — substantively different procurement-evaluation infrastructure than capability-only frameworks support.
The competitive read against the broader H2 2026 agent-evaluation infrastructure direction is that systematic multi-dimension evaluation becomes procurement-standard. Enterprise agent 37% lab-to-production gap demonstrates that single-dimension benchmark evaluation produces substantively misleading procurement signals. Multi-dimension Index-style evaluation addresses the methodology gap.
arXiv — The 2025 AI Agent Index (2602.17753) → · VoltAgent — Awesome AI Agent Papers 2026 →