
Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence
Top Score
N/A
Models Evaluated
0
Dataset Size
1,300 samples
Last Updated
May 6, 2024
Availability
SEvenLLM-Bench evaluation set has 1,300 test samples (100 MCQ + 1,200 QA, split evenly across English and Chinese) across 28 CTI tasks (13 understanding + 15 generation). Built from a bilingual corpus of 6,706 English and 1,779 Chinese cybersecurity reports.
Number of Tasks
3
No verified public primary numeric leaderboard/result table has been extracted into the catalog yet; metadata and source links were refreshed during the 2026-05-12 audit.