Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Cyber LLM Benchmark Hub Logo
Cyber LLM Benchmark Hub
  • Home
  • Benchmarks
  • Leaderboards
  • Compare
  • Submit
Support
Back to Benchmarks
vulnerability analysisvulnerability-reasoningpoc-generation

SecLLMHolmes

A Comprehensive Evaluation Framework and Benchmarks for LLMs in Security Vulnerability Identification and Reasoning

View PaperCompare Models
Quick Stats

Top Score

0.0%

Models Evaluated

0

Dataset Size

228 samples

Last Updated

July 24, 2024

Paper Details

Title

LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks

Authors

Saad Ullah, Mingji Han, Saurabh Pujar

+3 more

Published

July 24, 2024

arXiv ID

2312.12575
Metrics Tracked
accuracyreasoning faithfulnessrobustness
Availability
Dataset AvailableYes
Code AvailableYes
Dataset Information

228 code scenarios analyzed across 8 investigative dimensions including determinism, reasoning faithfulness, and robustness to code changes

Number of Tasks

vulnerability-identificationsecurity-reasoningcode-analysis

Dataset Size

228 samples

Model Results
Detailed scores for each model evaluated on this benchmark

No results yet

Be the first to submit results for this benchmark!

Submit Results