Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Contact
Support
Back to Benchmarks
Vulnerability AnalysisVulnerability ReasoningPoc Generation

SecLLMHolmes

A Comprehensive Evaluation Framework and Benchmarks for LLMs in Security Vulnerability Identification and Reasoning

View Paper
Quick Stats

Top Score

N/A

Models Evaluated

0

Dataset Size

228 samples

Last Updated

December 19, 2023

Availability

Dataset ✓Code ✓
Metrics Tracked
accuracyreasoning faithfulnessrobustness
Sources
Project
Dataset Information

228 code scenarios analyzed across 8 investigative dimensions including determinism, reasoning faithfulness, and robustness to code changes

Number of Tasks

3

Vulnerability IdentificationSecurity ReasoningCode Analysis
Model Results
Detailed scores for each model evaluated on this benchmark

Verified metadata only

No verified public primary numeric leaderboard/result table has been extracted into the catalog yet; metadata and source links were refreshed during the 2026-05-12 audit.

Review Source
Cyber LLM Benchmark Hub

Cyber LLM Benchmark Hub

Benchmarking frontier models across cybersecurity tasks.

BenchmarksContact

© 2026 Cyber LLM Benchmark Hub