Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Back to Benchmarks
Security KnowledgeMultiple ChoiceKnowledge Retention

SecEval

A Comprehensive Benchmark for Evaluating Cybersecurity Knowledge of Foundation Models

Visit Website
Quick Stats

Top Score

N/A

Models Evaluated

0

Dataset Size

2,126 samples

Last Updated

December 1, 2023

Availability

Dataset ✓Code ✓
Metrics Tracked
accuracydomain score
Sources
CodeDataset
Dataset Information

2,126 multiple-choice questions across 9 security domains (Software/Application/System/Web/Network Security, Cryptography, Memory Safety, PenTest, Vulnerability), generated from open-licensed textbooks, documentation, and industry guidelines.

Number of Tasks

9

Software SecurityApplication SecuritySystem SecurityWeb SecurityCryptographyMemory SafetyNetwork SecurityPentestVulnerability
Model Results
Detailed scores for each model evaluated on this benchmark

Verified metadata only

No verified public primary numeric leaderboard/result table has been extracted into the catalog yet; metadata and source links were refreshed during the 2026-05-12 audit.

Review Source
Cyber LLM Benchmark Hub

Cyber LLM Benchmark Hub

Benchmarking frontier models across cybersecurity tasks.

BenchmarksContact

© 2026 Cyber LLM Benchmark Hub