vulnerability analysisbug-bountypoc-generationpatch-validation

BountyBench

Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems

View Paper Compare Models

Quick Stats

Top Score

0.0%

Models Evaluated

Dataset Size

40 samples

Last Updated

July 10, 2025

Paper Details

Title

BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems

Authors

Stanford CRFM

Published

July 10, 2025

arXiv ID

2505.15216

Metrics Tracked

success ratetoken costbounty total-award

Availability

Dataset AvailableYes

Code AvailableYes

Dataset Information

25 systems with complex real-world codebases and 40 bug bounties covering 9 of OWASP Top 10 Risks

Number of Tasks

vulnerability-detectionexploit-generationpatch-generationdefense-evaluation

Dataset Size

40 samples

Model Results

Detailed scores for each model evaluated on this benchmark

No results yet

Be the first to submit results for this benchmark!

Submit Results

vulnerability analysisbug-bountypoc-generationpatch-validation

BountyBench

Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems

View Paper Compare Models

Quick Stats

Top Score

0.0%

Models Evaluated

Dataset Size

40 samples

Last Updated

July 10, 2025

Paper Details

Title

BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems

Authors

Stanford CRFM

Published

July 10, 2025

arXiv ID

2505.15216

Metrics Tracked

success ratetoken costbounty total-award

Availability

Dataset AvailableYes

Code AvailableYes

Dataset Information

25 systems with complex real-world codebases and 40 bug bounties covering 9 of OWASP Top 10 Risks

Number of Tasks

vulnerability-detectionexploit-generationpatch-generationdefense-evaluation

Dataset Size

40 samples

Model Results

Detailed scores for each model evaluated on this benchmark

No results yet

Be the first to submit results for this benchmark!

Submit Results