Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Back to Benchmarks
vulnerability analysispoc-generationpatch-validationvulnerability-reasoning

SEC-bench

Automated Benchmarking of LLM Agents on Real-World Software Security Tasks

View PaperCompare Models
Quick Stats

Top Score

0.0%

Models Evaluated

0

Dataset Size

N/A samples

Last Updated

October 22, 2025

Paper Details

Title

SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks

Authors

Hwiwon Lee, Ziqi Zhang, Hanxiao Lu

+1 more

Published

October 22, 2025

arXiv ID

2506.11791
Metrics Tracked
poc success-ratepatching success-rate
Availability
Dataset AvailableYes
Code AvailableYes
Dataset Information

Fully automated benchmarking framework with multi-agent scaffold for constructing code repositories, reproducing vulnerabilities, and generating gold patches

Number of Tasks

poc-generationvulnerability-patchingvulnerability-reproduction

Dataset Size

N/A samples

Model Results
Detailed scores for each model evaluated on this benchmark

No results yet

Be the first to submit results for this benchmark!

Submit Results