Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Back to Benchmarks
vulnerability analysisbug-bountypoc-generationpatch-validation

BountyBench

Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems

View PaperCompare Models
Quick Stats

Top Score

0.0%

Models Evaluated

0

Dataset Size

40 samples

Last Updated

July 10, 2025

Paper Details

Title

BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems

Authors

Stanford CRFM

Published

July 10, 2025

arXiv ID

2505.15216
Metrics Tracked
success ratetoken costbounty total-award
Availability
Dataset AvailableYes
Code AvailableYes
Dataset Information

25 systems with complex real-world codebases and 40 bug bounties covering 9 of OWASP Top 10 Risks

Number of Tasks

vulnerability-detectionexploit-generationpatch-generationdefense-evaluation

Dataset Size

40 samples

Model Results
Detailed scores for each model evaluated on this benchmark

No results yet

Be the first to submit results for this benchmark!

Submit Results