Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Back to Benchmarks
penetration testingexploit-generationvulnerability-discoveryautomated-pentesting

OCCULT

Evaluating Large Language Models for Offensive Cyber Operation Capabilities

View PaperCompare Models
Quick Stats

Top Score

0.0%

Models Evaluated

0

Dataset Size

N/A samples

Last Updated

February 18, 2025

Paper Details

Title

OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities

Authors

Michael Kouremetis, Marissa Dotter, Alex Byrne

+5 more

Published

February 18, 2025

arXiv ID

2502.15797
Metrics Tracked
accuracytactl scorecyberlayer performance
Availability
Dataset AvailableYes
Code AvailableYes
Dataset Information

Lightweight operational evaluation framework with TACTL (Threat Actor Competency Test for LLMs) MCQ benchmarks and MITRE CyberLayer simulation environment

Number of Tasks

threat-actor-competency-testoffensive-simulationmitre-cyberlayer-operations

Dataset Size

N/A samples

Model Results
Detailed scores for each model evaluated on this benchmark

No results yet

Be the first to submit results for this benchmark!

Submit Results