Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Back to Benchmarks
Security KnowledgeMultiple ChoiceKnowledge Retention

SecQA

A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer Security

View Paper
Quick Stats

Top Score

99.1%

Models Evaluated

8

Dataset Size

242 samples

Last Updated

December 26, 2023

Availability

Dataset ✓Code ✓
Metrics Tracked
secqa v1-0shot-accuracysecqa v1-5shot-accuracysecqa v2-0shot-accuracysecqa v2-5shot-accuracy
Sources
DatasetLeaderboard
Dataset Information

Multiple-choice Q&A dataset generated from 'Computer Systems Security: Planning for Success' textbook with two versions of increasing complexity. SecQA v1: 127 questions (dev 5 / val 12 / test 110). SecQA v2: 115 questions (dev 5 / val 10 / test 100). Total 242 questions; 210 in the combined test split.

Number of Tasks

2

Security QaKnowledge Assessment
Performance Comparison
Visual comparison of model performance on this benchmark
Model Results
Detailed scores for each model evaluated on this benchmark
RankModelsecqa v1-0shot-accuracysecqa v1-5shot-accuracysecqa v2-0shot-accuracysecqa v2-5shot-accuracyEvaluated ByDateSource
1st
GPT-3.5
gpt-3.5-turbo-0613 • OpenAI
99.1%99.1%98.0%98.0%SecQA authorsDecember 26, 2023Link
2nd
GPT-4
gpt-4-0613 • OpenAI
99.1%100.0%98.0%98.0%SecQA authorsDecember 26, 2023Link
3rd
Mistral 7B Instruct v0.2
mistral-7b-instruct-v0.2 • Mistral AI
90.9%90.9%89.0%87.0%SecQA authorsDecember 26, 2023Link
#4
Zephyr 7B Beta
zephyr-7b-beta • Hugging Face
84.6%92.7%81.0%86.0%SecQA authorsDecember 26, 2023Link
#5
Vicuna 13B v1.5
vicuna-13b-v1.5 • LMSYS
76.4%40.0%74.0%42.0%SecQA authorsDecember 26, 2023Link
#6
Llama 2
llama-2-7b-chat • Meta
72.7%61.8%79.0%50.0%SecQA authorsDecember 26, 2023Link
#7
Vicuna 7B v1.5
vicuna-7b-v1.5 • LMSYS
65.5%30.9%66.0%22.0%SecQA authorsDecember 26, 2023Link
#8
Llama 2
llama-2-13b-chat • Meta
49.1%89.1%51.0%89.0%SecQA authorsDecember 26, 2023Link
Cyber LLM Benchmark Hub

Cyber LLM Benchmark Hub

Benchmarking frontier models across cybersecurity tasks.

BenchmarksContact

© 2026 Cyber LLM Benchmark Hub