Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Back to Benchmarks
penetration testingautomated-pentestingexploit-generationvulnerability-discovery

AutoPenBench

Benchmarking Generative Agents for Penetration Testing with 33 vulnerable systems

View PaperCompare Models
Quick Stats

Top Score

0.0%

Models Evaluated

0

Dataset Size

33 samples

Last Updated

October 28, 2024

Paper Details

Title

AutoPenBench: Benchmarking Generative Agents for Penetration Testing

Authors

Luca Gioacchini, Marco Mellia, Idilio Drago

+3 more

Published

October 28, 2024

arXiv ID

2410.03225
Metrics Tracked
success ratetask completionmilestones achieved
Availability
Dataset AvailableYes
Code AvailableYes
Dataset Information

33 vulnerable systems of increasing difficulty including in-vitro and real-world scenarios with generic and specific milestone evaluation

Number of Tasks

vulnerability-exploitationprivilege-escalationnetwork-penetrationweb-exploitation

Dataset Size

33 samples

Model Results
Detailed scores for each model evaluated on this benchmark

No results yet

Be the first to submit results for this benchmark!

Submit Results