Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Back to Benchmarks
Comprehensive SecurityThreat DetectionVulnerability AssessmentSecurity Operations

CAIBench

Cybersecurity AI Benchmark - A Meta-Benchmark for Evaluating Cybersecurity AI Agents

View Paper
Quick Stats

Top Score

N/A

Models Evaluated

0

Dataset Size

10,000 samples

Last Updated

October 28, 2025

Availability

Dataset ✓Code ✓
Metrics Tracked
success rateknowledge scoreadversarial performance
Dataset Information

Modular meta-benchmark with 10,000+ instances across 5 evaluation categories including RCTF2 robotics challenges and CyberPII-Bench privacy assessment

Number of Tasks

5

Jeopardy CTFAttack Defense CTFCyber Range ExercisesKnowledge BenchmarksPrivacy Assessments
Model Results
Detailed scores for each model evaluated on this benchmark

Verified metadata only

No verified public primary numeric leaderboard/result table has been extracted into the catalog yet; metadata and source links were refreshed during the 2026-05-12 audit.

Review Source
Cyber LLM Benchmark Hub

Cyber LLM Benchmark Hub

Benchmarking frontier models across cybersecurity tasks.

BenchmarksContact

© 2026 Cyber LLM Benchmark Hub