
A Comprehensive Evaluation Framework and Benchmarks for LLMs in Security Vulnerability Identification and Reasoning
Top Score
N/A
Models Evaluated
0
Dataset Size
228 samples
Last Updated
December 19, 2023
Availability
228 code scenarios analyzed across 8 investigative dimensions including determinism, reasoning faithfulness, and robustness to code changes
Number of Tasks
3
No verified public primary numeric leaderboard/result table has been extracted into the catalog yet; metadata and source links were refreshed during the 2026-05-12 audit.