penetration testingautomated-pentestingexploit-generationvulnerability-discovery

AutoPenBench

Benchmarking Generative Agents for Penetration Testing with 33 vulnerable systems

View Paper Compare Models

Quick Stats

Top Score

0.0%

Models Evaluated

Dataset Size

33 samples

Last Updated

October 28, 2024

Paper Details

Title

AutoPenBench: Benchmarking Generative Agents for Penetration Testing

Authors

Luca Gioacchini, Marco Mellia, Idilio Drago

+3 more

Published

October 28, 2024

arXiv ID

2410.03225

Metrics Tracked

success ratetask completionmilestones achieved

Availability

Dataset AvailableYes

Code AvailableYes

Dataset Information

33 vulnerable systems of increasing difficulty including in-vitro and real-world scenarios with generic and specific milestone evaluation

Number of Tasks

vulnerability-exploitationprivilege-escalationnetwork-penetrationweb-exploitation

Dataset Size

33 samples

Model Results

Detailed scores for each model evaluated on this benchmark

No results yet

Be the first to submit results for this benchmark!

Submit Results

penetration testingautomated-pentestingexploit-generationvulnerability-discovery

AutoPenBench

Benchmarking Generative Agents for Penetration Testing with 33 vulnerable systems

View Paper Compare Models

Quick Stats

Top Score

0.0%

Models Evaluated

Dataset Size

33 samples

Last Updated

October 28, 2024

Paper Details

Title

AutoPenBench: Benchmarking Generative Agents for Penetration Testing

Authors

Luca Gioacchini, Marco Mellia, Idilio Drago

+3 more

Published

October 28, 2024

arXiv ID

2410.03225

Metrics Tracked

success ratetask completionmilestones achieved

Availability

Dataset AvailableYes

Code AvailableYes

Dataset Information

33 vulnerable systems of increasing difficulty including in-vitro and real-world scenarios with generic and specific milestone evaluation

Number of Tasks

vulnerability-exploitationprivilege-escalationnetwork-penetrationweb-exploitation

Dataset Size

33 samples

Model Results

Detailed scores for each model evaluated on this benchmark

No results yet

Be the first to submit results for this benchmark!

Submit Results