
Automated Benchmarking of LLM Agents on Real-World Software Security Tasks
Top Score
0.0%
Models Evaluated
0
Dataset Size
N/A samples
Last Updated
October 22, 2025
Title
SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks
Authors
Hwiwon Lee, Ziqi Zhang, Hanxiao Lu
+1 more
Published
October 22, 2025
arXiv ID
2506.11791Fully automated benchmarking framework with multi-agent scaffold for constructing code repositories, reproducing vulnerabilities, and generating gold patches
Number of Tasks
poc-generationvulnerability-patchingvulnerability-reproduction
Dataset Size
N/A samples