Baselight

DAIGT Proper Train Dataset

A dataset you can actually train on for the LLM Detect AI Generated Text comp.

@kaggle.thedrcat_daigt_proper_train_dataset

About this Dataset

DAIGT Proper Train Dataset

Version 2 updated on 11/2/2023:

Since there is no proper train dataset for LLM - Detect AI Generated Text competition, I decided to create one.

Ingredients (please upvote the included datasets!):

New version includes:

  • EssayID if available
  • Generation prompt if available
  • Random 10 fold split stratified by source dataset

Version 3 updated on 11/3/2023:

  • Additional 2400+ AI examples generated with Mistral 7B instruct and a new prompt (let's see how it works!)

Version 4 updated on 11/5/2023:

Share link

Anyone who has the link will be able to view this.