Security KnowledgeMultiple ChoiceKnowledge Retention

SecQA

A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer Security

View Paper

Quick Stats

Top Score

99.1%

Models Evaluated

Dataset Size

242 samples

Last Updated

December 26, 2023

Availability

Dataset ✓Code ✓

Metrics Tracked

secqa v1-0shot-accuracysecqa v1-5shot-accuracysecqa v2-0shot-accuracysecqa v2-5shot-accuracy

Sources

Dataset Leaderboard

Dataset Information

Multiple-choice Q&A dataset generated from 'Computer Systems Security: Planning for Success' textbook with two versions of increasing complexity. SecQA v1: 127 questions (dev 5 / val 12 / test 110). SecQA v2: 115 questions (dev 5 / val 10 / test 100). Total 242 questions; 210 in the combined test split.

Number of Tasks

Security QaKnowledge Assessment

Performance Comparison

Visual comparison of model performance on this benchmark

Model Results

Detailed scores for each model evaluated on this benchmark

Rank	Model	secqa v1-0shot-accuracy	secqa v1-5shot-accuracy	secqa v2-0shot-accuracy	secqa v2-5shot-accuracy	Evaluated By	Date	Source
1st	GPT-3.5 gpt-3.5-turbo-0613 • OpenAI	99.1%	99.1%	98.0%	98.0%	SecQA authors	December 26, 2023	Link
2nd	GPT-4 gpt-4-0613 • OpenAI	99.1%	100.0%	98.0%	98.0%	SecQA authors	December 26, 2023	Link
3rd	Mistral 7B Instruct v0.2 mistral-7b-instruct-v0.2 • Mistral AI	90.9%	90.9%	89.0%	87.0%	SecQA authors	December 26, 2023	Link
#4	Zephyr 7B Beta zephyr-7b-beta • Hugging Face	84.6%	92.7%	81.0%	86.0%	SecQA authors	December 26, 2023	Link
#5	Vicuna 13B v1.5 vicuna-13b-v1.5 • LMSYS	76.4%	40.0%	74.0%	42.0%	SecQA authors	December 26, 2023	Link
#6	Llama 2 llama-2-7b-chat • Meta	72.7%	61.8%	79.0%	50.0%	SecQA authors	December 26, 2023	Link
#7	Vicuna 7B v1.5 vicuna-7b-v1.5 • LMSYS	65.5%	30.9%	66.0%	22.0%	SecQA authors	December 26, 2023	Link
#8	Llama 2 llama-2-13b-chat • Meta	49.1%	89.1%	51.0%	89.0%	SecQA authors	December 26, 2023	Link