LLM: 7 Prompt Training Dataset
(for use in the LLM - Detect AI Generated Text competition)
@kaggle.carlmcbrideellis_llm_7_prompt_training_dataset
(for use in the LLM - Detect AI Generated Text competition)
@kaggle.carlmcbrideellis_llm_7_prompt_training_dataset
Version 4: Adding the data from "LLM-generated essay using PaLM from Google Gen-AI" kindly generated by Kingki19 / Muhammad Rizqi.
File: train_essays_RDizzl3_seven_v2.csv
Human texts: 14247 LLM texts: 3004
See also: a new dataset of an additional 4900 LLM generated texts: LLM: Mistral-7B Instruct texts
Version 3: "The RDizzl3 Seven"
File: train_essays_RDizzl3_seven_v1.csv
"Car-free cities"
"Does the electoral college work?"
"Exploring Venus"
"The Face on Mars"
"Facial action coding system"
"A Cowboy Who Rode the Waves"
"Driverless cars"
How this dataset was made: see the notebook "LLM: Make 7 prompt train dataset"
train_essays_7_prompts_v2.csv) This dataset is composed of 13,712 human texts and 1638 AI-LLM generated texts originating from 7 of the PERSUADE 2.0 corpus prompts.Namely:
Car-free cities"Does the electoral college work?"Exploring Venus"The Face on Mars"Facial action coding system"Seeking multiple opinions"Phones and driving"This dataset is a derivative of the datasets
as well as the original competition training dataset
CREATE TABLE train_essays_7_prompts (
"text" VARCHAR,
"label" BIGINT
);CREATE TABLE train_essays_7_prompts_v2 (
"text" VARCHAR,
"label" BIGINT
);CREATE TABLE train_essays_rdizzl3_seven_v1 (
"text" VARCHAR,
"label" BIGINT
);CREATE TABLE train_essays_rdizzl3_seven_v2 (
"text" VARCHAR,
"label" BIGINT
);Anyone who has the link will be able to view this.