
A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models
Top Score
0.0%
Models Evaluated
0
Dataset Size
40 samples
Last Updated
April 12, 2025
Title
Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models
Authors
Andy K. Zhang, Neil Perry, Riya Dulepet
+4 more
Published
April 12, 2025
arXiv ID
2408.0892640 professional-level CTF tasks from 4 distinct competitions with subtask breakdowns for detailed evaluation
Number of Tasks
professional-ctfvulnerability-exploitationreverse-engineeringcryptography
Dataset Size
40 samples