Important Note: the text
column is NOT AI generated. However, the source_text
is, which can still be used as AI generated text. I will update the dataset accordingly. Consequently, this dataset provides 2421 student generated texts (text
column) and 2421 AI generated texts (source_text
column). I will update as soon as possible.
In the LLM- Detect AI Generated Text competition you are required to distinguish between student-made and AI-generated texts. However, the competition's data only provides student-made texts.
Luckily, for CommonLit's competition I made a dataset with AI generated texts to use for that competition. Surprisingly, it's very much alike the data we need for in this competition!
My dataset not only has 2421 Chat GPT generated texts but also their prompts and source texts! That's double the data we are given in this competition!
Also, it's very diverse since the texts are generated from unique prompts.
The best of luck to all of you in this competition! 🍀
Dataset Description
id
: unique identifier for each text.
text
: extracted text from FeedBack Prize 3 competition. Can be used as student text.
instructions
: the instruction for ChatGPT to generate the text.
source_text
: AI generated text.