
Security Extraction, Understanding & Reasoning Evaluation - Benchmarking LLMs for Cybersecurity
Top Score
88.6%
Models Evaluated
6
Dataset Size
6 samples
Last Updated
May 30, 2024
Availability
Six datasets focused on Industrial Control System (ICS) sector evaluating knowledge extraction, understanding, and reasoning from industry-standard sources
Number of Tasks
3
| Rank | Model | maet accuracy | cwet accuracy | kcv accuracy | vood ood-accuracy | Evaluated By | Date | Source |
|---|---|---|---|---|---|---|---|---|
| 1st | GPT-4 gpt-4-turbo • OpenAI | 88.6% | 89.6% | 87.6% | 87.9% | SECURE authors | May 30, 2024 | Link |
| 2nd | Llama 3 70B Chat llama-3-70b-instruct • Meta | 86.3% | 90.4% | 85.2% | 27.1% | SECURE authors | May 30, 2024 | Link |
| 3rd | Gemini Pro gemini-pro-1.0 • Google | 86.2% | 87.8% | 83.5% | 86.7% | SECURE authors | May 30, 2024 | Link |
| #4 | GPT-3.5 gpt-3.5-turbo-0613 • OpenAI | 82.8% | 84.2% | 78.3% | 8.4% | SECURE authors | May 30, 2024 | Link |
| #5 | Llama 3 8B Instruct llama-3-8b-instruct • Meta | 82.1% | 83.9% | 82.8% | 56.4% | SECURE authors | May 30, 2024 | Link |
| #6 | Mistral 7B Instruct v0.2 mistral-7b-instruct-v0.2 • Mistral AI | 77.9% | 80.1% | 64.2% | 57.1% | SECURE authors | May 30, 2024 | Link |