Threat IntelligenceCTI ExtractionThreat Actor AnalysisAttack Pattern Recognition

SEvenLLM

Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence

View Paper

Quick Stats

Top Score

N/A

Models Evaluated

Dataset Size

1,300 samples

Last Updated

May 6, 2024

Availability

Dataset ✓Code ✓

Metrics Tracked

accuracyf1 scoretask performance

Dataset Information

SEvenLLM-Bench evaluation set has 1,300 test samples (100 MCQ + 1,200 QA, split evenly across English and Chinese) across 28 CTI tasks (13 understanding + 15 generation). Built from a bilingual corpus of 6,706 English and 1,779 Chinese cybersecurity reports.

Number of Tasks

Security Event AnalysisThreat Intelligence ExtractionIncident Analysis

Model Results

Detailed scores for each model evaluated on this benchmark

Verified metadata only

No verified public primary numeric leaderboard/result table has been extracted into the catalog yet; metadata and source links were refreshed during the 2026-05-12 audit.

Review Source