
A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities
Top Score
0.0%
Models Evaluated
0
Dataset Size
N/A samples
Last Updated
June 24, 2025
Title
CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities
Authors
Yuxuan Zhu, Antony Kellermann, Dylan Bowman
+13 more
Published
June 24, 2025
arXiv ID
2503.17332Real-world cybersecurity benchmark based on critical-severity CVEs with sandbox framework mimicking real-world conditions
Number of Tasks
cve-exploitationweb-app-attackssandbox-exploitation
Dataset Size
N/A samples