TY - GEN
T1 - A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries
AU - Fan, Jiahao
AU - Li, Yi
AU - Wang, Shaohua
AU - Nguyen, Tien N.
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/6/29
Y1 - 2020/6/29
N2 - We collected a large C/C++ code vulnerability dataset from open-source Github projects, namely Big-Vul. We crawled the public Common Vulnerabilities and Exposures (CVE) database and CVE-related source code repositories. Specifically, we collected the descriptive information of the vulnerabilities from the CVE database, e.g., CVE IDs, CVE severity scores, and CVE summaries. With the CVE information and its related published Github code repository links, we downloaded all of the code repositories and extracted vulnerability related code changes. In total, Big-Vul contains 3,754 code vulnerabilities spanning 91 different vulnerability types. All these code vulnerabilities are extracted from 348 Github projects. All information is stored in the CSV format. We linked the code changes with the CVE descriptive information. Thus, our Big-Vul can be used for various research topics, e.g., detecting and fixing vulnerabilities, analyzing the vulnerability related code changes. Big-Vul is publicly available on Github.
AB - We collected a large C/C++ code vulnerability dataset from open-source Github projects, namely Big-Vul. We crawled the public Common Vulnerabilities and Exposures (CVE) database and CVE-related source code repositories. Specifically, we collected the descriptive information of the vulnerabilities from the CVE database, e.g., CVE IDs, CVE severity scores, and CVE summaries. With the CVE information and its related published Github code repository links, we downloaded all of the code repositories and extracted vulnerability related code changes. In total, Big-Vul contains 3,754 code vulnerabilities spanning 91 different vulnerability types. All these code vulnerabilities are extracted from 348 Github projects. All information is stored in the CSV format. We linked the code changes with the CVE descriptive information. Thus, our Big-Vul can be used for various research topics, e.g., detecting and fixing vulnerabilities, analyzing the vulnerability related code changes. Big-Vul is publicly available on Github.
KW - C/C++ Code
KW - Code Changes
KW - Common Vulnerabilities and Exposures
UR - http://www.scopus.com/inward/record.url?scp=85093651575&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85093651575&partnerID=8YFLogxK
U2 - 10.1145/3379597.3387501
DO - 10.1145/3379597.3387501
M3 - Conference contribution
AN - SCOPUS:85093651575
T3 - Proceedings - 2020 IEEE/ACM 17th International Conference on Mining Software Repositories, MSR 2020
SP - 508
EP - 512
BT - Proceedings - 2020 IEEE/ACM 17th International Conference on Mining Software Repositories, MSR 2020
PB - Association for Computing Machinery, Inc
T2 - 17th IEEE/ACM International Conference on Mining Software Repositories, MSR 2020, co-located with the 42nd International Conference on Software Engineering. ICSE 2020
Y2 - 29 June 2020 through 30 June 2020
ER -