Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
27 Benchmarks • 10 Categories

Cyber LLM
Benchmark Hub

The definitive source for cybersecurity LLM performance.
Compare models across

Explore BenchmarksView Leaderboards
27
Benchmarks
16
Models
10
Categories
23
Results

Benchmark Categories

Comprehensive evaluation across 10 cybersecurity domains

Malware Analysis

1 benchmark

Penetration Testing

2 benchmarks

Incident Response

1 benchmark

Comprehensive Security

3 benchmarks

CTF Challenges

2 benchmarks

Vulnerability Analysis

5 benchmarks

Security Knowledge

6 benchmarks

Threat Intelligence

4 benchmarks

Threat Modeling

1 benchmark

LLM Safety & Jailbreaking

2 benchmarks

Featured Benchmarks

Latest cybersecurity LLM evaluation datasets

View All
CyberSOCEval
Malware Analysis
7 models

Benchmarking LLMs Capabilities for Malware Analysis and Threat Intelligence Reasoning

Top Performer84.7%
GPT-4
View DetailsCompare
ExCyTIn-Bench
Comprehensive Security
7 models

Microsoft's benchmark for measuring AI capabilities in cybersecurity contexts

Top Performer89.2%
GPT-4
View DetailsCompare
SANDBOXESCAPEBENCH
Llm Safety
9 models

Quantifying frontier LLM capabilities for container sandbox escape using an Inspect-based sandbox-in-a-sandbox evaluation.

Top Performer49.7%
GPT-5
View DetailsCompare

Recently Added

DFIR-Metric

A Benchmark Dataset for Evaluating Large Language Models in Digital Forensics and Incident Response

AutoPenBench

Benchmarking Generative Agents for Penetration Testing with 33 vulnerable systems

CVE-Bench

A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities

OCCULT

Evaluating Large Language Models for Offensive Cyber Operation Capabilities

BountyBench

Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems

SEC-bench

Automated Benchmarking of LLM Agents on Real-World Software Security Tasks

SecLLMHolmes

A Comprehensive Evaluation Framework and Benchmarks for LLMs in Security Vulnerability Identification and Reasoning

Cybench

A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models

Contribute to the Community

Help build the most comprehensive cybersecurity LLM benchmark database. Submit your evaluation results or support the project.

Submit ResultsSupport the Project
Cyber LLM Hub
Cyber LLM Hub

© 2026 Cyber LLM Benchmark Hub. Built with ❤️ for the cybersecurity community.