Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Back to Benchmarks
ctf challengesjeopardy-ctfpwnreverse-engineeringweb-exploitationcryptography

Cybench

A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models

View PaperCompare Models
Quick Stats

Top Score

0.0%

Models Evaluated

0

Dataset Size

40 samples

Last Updated

April 12, 2025

Paper Details

Title

Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models

Authors

Andy K. Zhang, Neil Perry, Riya Dulepet

+4 more

Published

April 12, 2025

arXiv ID

2408.08926
Metrics Tracked
task completion-ratesubtask completiontime to-solve
Availability
Dataset AvailableYes
Code AvailableYes
Dataset Information

40 professional-level CTF tasks from 4 distinct competitions with subtask breakdowns for detailed evaluation

Number of Tasks

professional-ctfvulnerability-exploitationreverse-engineeringcryptography

Dataset Size

40 samples

Model Results
Detailed scores for each model evaluated on this benchmark

No results yet

Be the first to submit results for this benchmark!

Submit Results