Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Back to Benchmarks
Threat IntelligenceCTI ExtractionThreat Actor AnalysisAttack Pattern Recognition

SEvenLLM

Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence

View Paper
Quick Stats

Top Score

N/A

Models Evaluated

0

Dataset Size

1,300 samples

Last Updated

May 6, 2024

Availability

Dataset ✓Code ✓
Metrics Tracked
accuracyf1 scoretask performance
Dataset Information

SEvenLLM-Bench evaluation set has 1,300 test samples (100 MCQ + 1,200 QA, split evenly across English and Chinese) across 28 CTI tasks (13 understanding + 15 generation). Built from a bilingual corpus of 6,706 English and 1,779 Chinese cybersecurity reports.

Number of Tasks

3

Security Event AnalysisThreat Intelligence ExtractionIncident Analysis
Model Results
Detailed scores for each model evaluated on this benchmark

Verified metadata only

No verified public primary numeric leaderboard/result table has been extracted into the catalog yet; metadata and source links were refreshed during the 2026-05-12 audit.

Review Source
Cyber LLM Benchmark Hub

Cyber LLM Benchmark Hub

Benchmarking frontier models across cybersecurity tasks.

BenchmarksContact

© 2026 Cyber LLM Benchmark Hub