PII | External Dataset
LLM Generated External Dataset for PII Data Detection Competition
@kaggle.alejopaullier_pii_external_dataset
LLM Generated External Dataset for PII Data Detection Competition
@kaggle.alejopaullier_pii_external_dataset
This is an LLM-generated external dataset for the:
It contains 3382 4434 generated texts with their corresponding annotated labels in the required competition format.
Description:
document (str): ID of the essayfull_text (string): AI generated text.tokens (string): a list with the tokens (comes from text.split())trailing_whitespace (list): a list with boolean values indicating whether each token is followed by whitespace.labels (list): list with token labels in BIO formatCREATE TABLE pii_dataset (
"document" VARCHAR,
"text" VARCHAR,
"tokens" VARCHAR,
"trailing_whitespace" VARCHAR,
"labels" VARCHAR,
"prompt" VARCHAR,
"prompt_id" BIGINT,
"name" VARCHAR,
"email" VARCHAR,
"phone" VARCHAR,
"job" VARCHAR,
"address" VARCHAR,
"username" VARCHAR,
"url" VARCHAR,
"hobby" VARCHAR,
"len" BIGINT
);Anyone who has the link will be able to view this.