Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Back to Benchmarks
vulnerability analysiscve-exploitationpoc-generationweb-exploitation

CVE-Bench

A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities

View PaperCompare Models
Quick Stats

Top Score

0.0%

Models Evaluated

0

Dataset Size

N/A samples

Last Updated

June 24, 2025

Paper Details

Title

CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities

Authors

Yuxuan Zhu, Antony Kellermann, Dylan Bowman

+13 more

Published

June 24, 2025

arXiv ID

2503.17332
Metrics Tracked
exploitation success-ratevulnerability coverage
Availability
Dataset AvailableYes
Code AvailableYes
Dataset Information

Real-world cybersecurity benchmark based on critical-severity CVEs with sandbox framework mimicking real-world conditions

Number of Tasks

cve-exploitationweb-app-attackssandbox-exploitation

Dataset Size

N/A samples

Model Results
Detailed scores for each model evaluated on this benchmark

No results yet

Be the first to submit results for this benchmark!

Submit Results