Baselight

DAIGT | External Dataset

External Dataset for the LLM- Detect AI Generated Text competition

@kaggle.alejopaullier_daigt_external_dataset

Loading...
Loading...

About this Dataset

DAIGT | External Dataset

Important Note: the text column is NOT AI generated. However, the source_text is, which can still be used as AI generated text. I will update the dataset accordingly. Consequently, this dataset provides 2421 student generated texts (text column) and 2421 AI generated texts (source_text column). I will update as soon as possible.

In the LLM- Detect AI Generated Text competition you are required to distinguish between student-made and AI-generated texts. However, the competition's data only provides student-made texts.

Luckily, for CommonLit's competition I made a dataset with AI generated texts to use for that competition. Surprisingly, it's very much alike the data we need for in this competition!

My dataset not only has 2421 Chat GPT generated texts but also their prompts and source texts! That's double the data we are given in this competition!

Also, it's very diverse since the texts are generated from unique prompts.

The best of luck to all of you in this competition! 🍀

Dataset Description

  • id: unique identifier for each text.
  • text: extracted text from FeedBack Prize 3 competition. Can be used as student text.
  • instructions: the instruction for ChatGPT to generate the text.
  • source_text: AI generated text.

Tables

Daigt External Dataset

@kaggle.alejopaullier_daigt_external_dataset.daigt_external_dataset
  • 4.63 MB
  • 2421 rows
  • 4 columns
Loading...

CREATE TABLE daigt_external_dataset (
  "id" VARCHAR,
  "text" VARCHAR,
  "instructions" VARCHAR,
  "source_text" VARCHAR
);

Share link

Anyone who has the link will be able to view this.