
A Benchmark Dataset for Evaluating Large Language Models in Digital Forensics and Incident Response
Top Score
0.0%
Models Evaluated
0
Dataset Size
1,350 samples
Last Updated
May 26, 2025
Title
DFIR-Metric: A Benchmark Dataset for Evaluating Large Language Models in Digital Forensics and Incident Response
Authors
Bilel Cherif, Tamas Bisztray, Richard A. Dubniczky
+3 more
Published
May 26, 2025
arXiv ID
2505.19973Three components: 700 expert-reviewed MCQs from industry certifications, 150 CTF-style forensic tasks, and 500 NIST CFTT disk/memory forensic cases
Number of Tasks
knowledge-assessmentctf-forensic-challengesdisk-forensicsmemory-forensics
Dataset Size
1,350 samples