
A Comprehensive Evaluation Framework and Benchmarks for LLMs in Security Vulnerability Identification and Reasoning
Top Score
0.0%
Models Evaluated
0
Dataset Size
228 samples
Last Updated
July 24, 2024
Title
LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks
Authors
Saad Ullah, Mingji Han, Saurabh Pujar
+3 more
Published
July 24, 2024
arXiv ID
2312.12575228 code scenarios analyzed across 8 investigative dimensions including determinism, reasoning faithfulness, and robustness to code changes
Number of Tasks
vulnerability-identificationsecurity-reasoningcode-analysis
Dataset Size
228 samples