DAIGT Proper Train Dataset
A dataset you can actually train on for the LLM Detect AI Generated Text comp.
@kaggle.thedrcat_daigt_proper_train_dataset
A dataset you can actually train on for the LLM Detect AI Generated Text comp.
@kaggle.thedrcat_daigt_proper_train_dataset
Version 2 updated on 11/2/2023:
Since there is no proper train dataset for LLM - Detect AI Generated Text competition, I decided to create one.
Ingredients (please upvote the included datasets!):
New version includes:
Version 3 updated on 11/3/2023:
Version 4 updated on 11/5/2023:
CREATE TABLE train_drcat_01 (
  "text" VARCHAR,
  "label" BIGINT,
  "source" VARCHAR,
  "fold" BIGINT
);CREATE TABLE train_drcat_02 (
  "essay_id" VARCHAR,
  "text" VARCHAR,
  "label" BIGINT,
  "source" VARCHAR,
  "prompt" VARCHAR,
  "fold" BIGINT
);CREATE TABLE train_drcat_03 (
  "essay_id" VARCHAR,
  "text" VARCHAR,
  "label" BIGINT,
  "source" VARCHAR,
  "prompt" VARCHAR,
  "fold" BIGINT
);CREATE TABLE train_drcat_04 (
  "essay_id" VARCHAR,
  "text" VARCHAR,
  "label" BIGINT,
  "source" VARCHAR,
  "prompt" VARCHAR,
  "fold" BIGINT
);Anyone who has the link will be able to view this.