Baselight
Sign In
zenodo

Software Vulnerability Detection Datasets - Function/method Level

Verified Source
EU Open Research Repository

@zenodo.oai_zenodo_org_13870382

Loading...
Loading...

Zenodo

Dataset Description

This dataset is for software vulnerability detection and includes source code in eight programming languages (C, C++, Java, JavaScript, Go, PHP, Ruby, Python). All data is collected from GitHub. data{programming language}_vul.json: a set of vulnerable code samples in a certain programming language. data{programming language}_patch.json: a set of patching code samples in a certain programming language.   Each source code sample includes the following 16 properties:  index: index of code. If is_vulnerable==False, this index indicates that this code is a patch of the indexing vulnerable code. code: raw source code (may include comments). is_vulnerable: the code is vulnerable (True) or a patch (False). programming_language: programming language of the code. method_name: name of the method. file_name: name of the file where the source code is extracted. repo_url: url of the project repository. repo_owner: owner of the repository. committer: developer who pushed the commit. committer_date: date when the commit was pushed. commit_msg: the commit message. cwe_id: If is_vulnerable==True, the CWE id; otherwise None. cwe_name: If is_vulnerable==True, the name of corresponding CWE; otherwise None. cwe_description: If is_vulnerable==True, the description of corresponding CWE; otherwise None. cwe_url: If is_vulnerable==True, the url to obtain more details of corresponding CWE; otherwise None. cve_id: If is_vulnerable==True, the CVE id; otherwise None.
Publisher name: Zenodo
Last updated: 2026-02-20T15:02:06Z


Related Datasets

Share link

Anyone who has the link will be able to view this.