Baselight

LLM: 7 Prompt Training Dataset

(for use in the LLM - Detect AI Generated Text competition)

@kaggle.carlmcbrideellis_llm_7_prompt_training_dataset

About this Dataset

LLM: 7 Prompt Training Dataset

  • Version 4: Adding the data from "LLM-generated essay using PaLM from Google Gen-AI" kindly generated by Kingki19 / Muhammad Rizqi.
    File: train_essays_RDizzl3_seven_v2.csv
    Human texts: 14247 LLM texts: 3004


    See also: a new dataset of an additional 4900 LLM generated texts: LLM: Mistral-7B Instruct texts

  • Version 3: "The RDizzl3 Seven"
    File: train_essays_RDizzl3_seven_v1.csv

  • "Car-free cities"

  • "Does the electoral college work?"

  • "Exploring Venus"

  • "The Face on Mars"

  • "Facial action coding system"

  • "A Cowboy Who Rode the Waves"

  • "Driverless cars"

How this dataset was made: see the notebook "LLM: Make 7 prompt train dataset"

  • Version 2: (train_essays_7_prompts_v2.csv) This dataset is composed of 13,712 human texts and 1638 AI-LLM generated texts originating from 7 of the PERSUADE 2.0 corpus prompts.

Namely:

  • "Car-free cities"
  • "Does the electoral college work?"
  • "Exploring Venus"
  • "The Face on Mars"
  • "Facial action coding system"
  • "Seeking multiple opinions"
  • "Phones and driving"

This dataset is a derivative of the datasets

as well as the original competition training dataset

  • Version 1:This dataset is composed of 13,712 human texts and 1165 AI-LLM generated texts originating from 7 of the PERSUADE 2.0 corpus prompts.

Share link

Anyone who has the link will be able to view this.