
Benchmarking Generative Agents for Penetration Testing with 33 vulnerable systems
Top Score
64.0%
Models Evaluated
2
Dataset Size
33 samples
Last Updated
October 4, 2024
Availability
33 vulnerable systems of increasing difficulty including in-vitro and real-world scenarios with generic and specific milestone evaluation
Number of Tasks
4