Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Back to Benchmarks
Security KnowledgeMultiple ChoiceShort AnswerKnowledge RetentionLogical Reasoning

SecBench

A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity

View Paper
Quick Stats

Top Score

94.3%

Models Evaluated

5

Dataset Size

47,910 samples

Last Updated

December 30, 2024

Availability

Dataset ✓Code ✓
Metrics Tracked
reported score
Sources
CodeLeaderboardDataset
Dataset Information

Largest cybersecurity benchmark with 44,823 MCQs and 3,087 SAQs in multiple languages covering various security sub-domains

Number of Tasks

4

Multiple Choice QuestionsShort Answer QuestionsKnowledge RetentionLogical Reasoning
Performance Comparison
Visual comparison of model performance on this benchmark
Model Results
Detailed scores for each model evaluated on this benchmark
RankModelreported scoreEvaluated ByDateSource
1st
Hunyuan Turbo
hunyuan-turbo • Tencent
94.3%SecBench authors via Argmin AI summaryDecember 30, 2024Link
2nd
GPT-4o
gpt-4o • OpenAI
91.0%SecBench authors via Argmin AI summaryDecember 30, 2024Link
3rd
OpenAI o1-preview
o1-preview • OpenAI
89.2%SecBench authors via Argmin AI summaryDecember 30, 2024Link
#4
OpenAI o1-mini
o1-mini • OpenAI
87.5%SecBench authors via Argmin AI summaryDecember 30, 2024Link
#5
GPT-4o Mini
gpt-4o-mini • OpenAI
82.5%SecBench authors via Argmin AI summaryDecember 30, 2024Link
Cyber LLM Benchmark Hub

Cyber LLM Benchmark Hub

Benchmarking frontier models across cybersecurity tasks.

BenchmarksContact

© 2026 Cyber LLM Benchmark Hub