OpenAI HumanEval (Coding Challenges & Unit-tests)
164 programming problems with a function signature, docstring, body, unittests
@kaggle.thedevastator_handcrafted_dataset_for_code_generation_models
164 programming problems with a function signature, docstring, body, unittests
@kaggle.thedevastator_handcrafted_dataset_for_code_generation_models
Huggingface Hub: link
The OpenAI HumanEval dataset is a handcrafted set of 164 programming problems designed to challenge code generation models. The problems include a function signature, docstring, body, and several unit tests, all handwritten to ensure they're not included in the training set of code generation models. The entry point for each problem is the prompt, making it an ideal dataset for testing natural language processing and machine learning models' ability to generate Python programs from scratch
To use this dataset, simply download the zip file and extract it. The resulting directory will contain the following files:
canonical_solution.py: The solution to the problem. (String)
entry_point.py: The entry point for the problem. (String)
prompt.txt: The prompt for the problem. (String)
test.py: The unit tests for the problem
- The dataset could be used to develop a model that generates programs from natural language.
- The dataset could be used to develop a model that completes or debugs programs.
- The dataset could be used to develop a model that writes unit tests for programs
License
> License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
> No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: test.csv
| Column name | Description |
|---|---|
| prompt | A natural language description of the programming problem. (String) |
| canonical_solution | The correct Python code solution to the problem. (String) |
| test | A set of unit tests that the generated code must pass in order to be considered correct. (String) |
| entry_point | The starting point for the generated code. (String) |
CREATE TABLE test (
"task_id" VARCHAR,
"prompt" VARCHAR,
"canonical_solution" VARCHAR,
"test" VARCHAR,
"entry_point" VARCHAR
);Anyone who has the link will be able to view this.