Security KnowledgeKnowledge RetentionLogical Reasoning

SECURE

Security Extraction, Understanding & Reasoning Evaluation - Benchmarking LLMs for Cybersecurity

View Paper

Quick Stats

Top Score

88.6%

Models Evaluated

Dataset Size

6 samples

Last Updated

May 30, 2024

Availability

Dataset ✓Code ✓

Metrics Tracked

maet accuracycwet accuracykcv accuracyvood ood-accuracy

Dataset Information

Six datasets focused on Industrial Control System (ICS) sector evaluating knowledge extraction, understanding, and reasoning from industry-standard sources

Number of Tasks

Knowledge ExtractionUnderstandingReasoning

Performance Comparison

Visual comparison of model performance on this benchmark

Model Results

Detailed scores for each model evaluated on this benchmark

Rank	Model	maet accuracy	cwet accuracy	kcv accuracy	vood ood-accuracy	Evaluated By	Date	Source
1st	GPT-4 gpt-4-turbo • OpenAI	88.6%	89.6%	87.6%	87.9%	SECURE authors	May 30, 2024	Link
2nd	Llama 3 70B Chat llama-3-70b-instruct • Meta	86.3%	90.4%	85.2%	27.1%	SECURE authors	May 30, 2024	Link
3rd	Gemini Pro gemini-pro-1.0 • Google	86.2%	87.8%	83.5%	86.7%	SECURE authors	May 30, 2024	Link
#4	GPT-3.5 gpt-3.5-turbo-0613 • OpenAI	82.8%	84.2%	78.3%	8.4%	SECURE authors	May 30, 2024	Link
#5	Llama 3 8B Instruct llama-3-8b-instruct • Meta	82.1%	83.9%	82.8%	56.4%	SECURE authors	May 30, 2024	Link
#6	Mistral 7B Instruct v0.2 mistral-7b-instruct-v0.2 • Mistral AI	77.9%	80.1%	64.2%	57.1%	SECURE authors	May 30, 2024	Link