(4900 LLM texts for the "Detect AI Generated Text" competition)

This dataset (specifically the file Mistral7B_CME_v7.csv) consists of 4900 LLM generated texts.
(Note: versions 1 to 6 are redundant, and are only kept so as not to break any notebooks that use them)

Update: The new file Mistral7B_CME_v7_15_percent_corruption.csv has also been added as per the discussion "Alternative approach - Simulating hidden dataset".

v1: 700 LLM texts for prompt 6 "Exploring Venus" for use in the LLM - Detect AI Generated Text competition.

v2: + 700 LLM texts for prompt 8 "The Face on Mars"

v3: + 700 LLM texts for prompt 4 "A Cowboy Who Rode the Waves"

v4: + 700 LLM texts for prompt 11 "Driverless cars"

v5: + 700 LLM texts for prompt 7 "Facial action coding system"

v6: + 700 LLM texts for prompt 2 "Car-free cities"

v7: + 700 LLM texts for prompt 12 "Does the electoral college work?"

Photo credit: Image of Venus by NASA.

Related Datasets

LLM: 7 Prompt Training Dataset

@kaggle
AI Performance On Language Tasks

@owid
Large Language Model Performance And Compute, Epoch (2023)

@owid
Dummy Monster

@owid
AI Performance On Math Problems

@owid
Fur Banning

@owid

LLM: 7 Prompt Training Dataset

AI Performance On Language Tasks

Large Language Model Performance And Compute, Epoch (2023)

Dummy Monster

AI Performance On Math Problems

Fur Banning