
A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity
Top Score
94.3%
Models Evaluated
5
Dataset Size
47,910 samples
Last Updated
December 30, 2024
Availability
Largest cybersecurity benchmark with 44,823 MCQs and 3,087 SAQs in multiple languages covering various security sub-domains
Number of Tasks
4
| Rank | Model | reported score | Evaluated By | Date | Source |
|---|---|---|---|---|---|
| 1st | Hunyuan Turbo hunyuan-turbo • Tencent | 94.3% | SecBench authors via Argmin AI summary | December 30, 2024 | Link |
| 2nd | GPT-4o gpt-4o • OpenAI | 91.0% | SecBench authors via Argmin AI summary | December 30, 2024 | Link |
| 3rd | OpenAI o1-preview o1-preview • OpenAI | 89.2% | SecBench authors via Argmin AI summary | December 30, 2024 | Link |
| #4 | OpenAI o1-mini o1-mini • OpenAI | 87.5% | SecBench authors via Argmin AI summary | December 30, 2024 | Link |
| #5 | GPT-4o Mini gpt-4o-mini • OpenAI | 82.5% | SecBench authors via Argmin AI summary | December 30, 2024 | Link |