OpenAI HumanEval (Coding Challenges & Unit-tests) by Kaggle | Healthcare

About this Dataset

OpenAI HumanEval (Coding Challenges & Unit-tests)

164 programming problems with a function signature, docstring, body, unittests

Source

Huggingface Hub: link

About this dataset

The OpenAI HumanEval dataset is a handcrafted set of 164 programming problems designed to challenge code generation models. The problems include a function signature, docstring, body, and several unit tests, all handwritten to ensure they're not included in the training set of code generation models. The entry point for each problem is the prompt, making it an ideal dataset for testing natural language processing and machine learning models' ability to generate Python programs from scratch

How to use the dataset

To use this dataset, simply download the zip file and extract it. The resulting directory will contain the following files:

canonical_solution.py: The solution to the problem. (String)
entry_point.py: The entry point for the problem. (String)
prompt.txt: The prompt for the problem. (String)
test.py: The unit tests for the problem

Research Ideas

The dataset could be used to develop a model that generates programs from natural language.

The dataset could be used to develop a model that completes or debugs programs.

The dataset could be used to develop a model that writes unit tests for programs

Acknowledgements

License

> License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
> No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: test.csv

Column name	Description
prompt	A natural language description of the programming problem. (String)
canonical_solution	The correct Python code solution to the problem. (String)
test	A set of unit tests that the generated code must pass in order to be considered correct. (String)
entry_point	The starting point for the generated code. (String)

Tables

Test

@kaggle.thedevastator_handcrafted_dataset_for_code_generation_models.test

83.24 KB
164 rows
5 columns


CREATE TABLE test (
  "task_id" VARCHAR,
  "prompt" VARCHAR,
  "canonical_solution" VARCHAR,
  "test" VARCHAR,
  "entry_point" VARCHAR
);