ProsocialDialog - Problematic Content Dialogue by Kaggle | Other

About this Dataset

ProsocialDialog - Problematic Content Dialogue

ProsocialDialog - Problematic Content Dialogue Dataset

Teach conversation agents to respond to problematic topics

By Huggingface Hub [source]

About this dataset

ProsocialDialog is the first large-scale multi-turn English dialogue dataset to teach conversational agents to respond to problematic content following social norms. Covering diverse unethical, problematic, biased, and toxic situations, ProsocialDialog contains responses that encourage prosocial behavior, grounded in commonsense social rules (i.e., rules-of-thumb, RoTs). Created via a human-AI collaborative framework, ProsocialDialog consists of 58K dialogues, with 331K utterances, 160K unique RoTs, and 497K dialogue safety labels accompanied by free-form rationales.

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This guide will explain how to use the data in this dataset for teaching conversational agents normative responses to problematic content.

Understand the columns: Familiarizing yourself with the columns provided in this dataset is important so you know what types of information are available for your analysis. The following columns are included in this dataset: 'context','response','rots','safety_label', 'safety_annotations','safety_annotation_reasons', 'source', and 'etc'. Each column contains different information about dialogue conversations including the context, response, rules of thumb (RoTs), safety label, annotations, rationale and sources used for conversations.

Explore Safety Labels : Exploring through each safety label will allow you understand what type of conversation is deemed appropriate or inappropriate by its corresponding label in the ‘safety_label’ column . In addition to exploring these labels it can also be helpful to explore through its respective ‘safety annotations’ as well as their associated ‘free-form rationales’ which allows sees where certain decisions were made within these conversations when giving ratings towards them .

Learn from Rules of Thumb (ROTs): Examining both individually listed ROTs and actuall dialogues that have been culturally deemed as acceptable or unacceptable can help you better understand what actions ought to be taken when providing a normative response towards any type of problematic content one may encounter within their own conversation settings .

Analyze Sources : Analyzing sources plays an important role since they give insight into where they obtained any given data from , whether they are first party interviews or third party websites, analyzing sources gives us insight into why that particular piece was labeled a certain way while others may have been given higher/lower ratings depending on such factors like trustworthiness among other things which should be kept into consideration when using this source for training models .

5 Taking Action: After familiarizing yourself with all these various components , try mapping out scenarios between two people engaging in conversation and write directions based on each ROT applicable provide scenarios demonstrating socially acceptable behavior when confronted with nonnormative behavior throughout conversations using networks like self reinforcing looping can produce

Research Ideas

Designing Conversational Agents: This dataset can be used to build natural language processing (NLP) models that can recognize and classify problematic content. The safety labels, rationales, and RoTs can be leveraged to teach conversational agents how to respond to such content in a socially acceptable manner.

Benchmark Systems: ProsocialDialog could be used as a benchmark system for assessing the performance of existing conversation datasets in terms of recognizing, responding to, and helping prevent problematic content interactions.

Automated Moderation: The dialogue safety labels and associated free-form rationales found in the dataset can be leveraged by technology platforms for automated moderation tasks such as flagging or banning offensive messages or involved users when needed

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: validation.csv

Column name	Description
context	The context of the conversation. (String)
response	The response to the conversation. (String)
rots	Rules of thumb associated with the conversation. (String)
safety_label	The safety label associated with the conversation. (String)
safety_annotations	Annotations associated with the conversation. (String)
safety_annotation_reasons	Reasons for the safety annotations. (String)
source	The source of the conversation. (String)
etc	Any additional information associated with the conversation. (String)
episode_done	Whether the conversation is complete or not. (Boolean)

File: train.csv

Column name	Description
context	The context of the conversation. (String)
response	The response to the conversation. (String)
rots	Rules of thumb associated with the conversation. (String)
safety_label	The safety label associated with the conversation. (String)
safety_annotations	Annotations associated with the conversation. (String)
safety_annotation_reasons	Reasons for the safety annotations. (String)
source	The source of the conversation. (String)
etc	Any additional information associated with the conversation. (String)
episode_done	Whether the conversation is complete or not. (Boolean)

File: test.csv

Column name	Description
context	The context of the conversation. (String)
response	The response to the conversation. (String)
rots	Rules of thumb associated with the conversation. (String)
safety_label	The safety label associated with the conversation. (String)
safety_annotations	Annotations associated with the conversation. (String)
safety_annotation_reasons	Reasons for the safety annotations. (String)
source	The source of the conversation. (String)
etc	Any additional information associated with the conversation. (String)
episode_done	Whether the conversation is complete or not. (Boolean)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.

Tables

Test

@kaggle.thedevastator_prosocialdialog_dialogue_dataset.test

5.94 MB
25029 rows
11 columns


CREATE TABLE test (
  "context" VARCHAR,
  "response" VARCHAR,
  "rots" VARCHAR,
  "safety_label" VARCHAR,
  "safety_annotations" VARCHAR,
  "safety_annotation_reasons" VARCHAR,
  "source" VARCHAR,
  "etc" VARCHAR,
  "dialogue_id" BIGINT,
  "response_id" BIGINT,
  "episode_done" BOOLEAN
);

Train

@kaggle.thedevastator_prosocialdialog_dialogue_dataset.train

28.04 MB
120236 rows
11 columns


CREATE TABLE train (
  "context" VARCHAR,
  "response" VARCHAR,
  "rots" VARCHAR,
  "safety_label" VARCHAR,
  "safety_annotations" VARCHAR,
  "safety_annotation_reasons" VARCHAR,
  "source" VARCHAR,
  "etc" VARCHAR,
  "dialogue_id" BIGINT,
  "response_id" BIGINT,
  "episode_done" BOOLEAN
);

Validation

@kaggle.thedevastator_prosocialdialog_dialogue_dataset.validation

4.85 MB
20416 rows
11 columns


CREATE TABLE validation (
  "context" VARCHAR,
  "response" VARCHAR,
  "rots" VARCHAR,
  "safety_label" VARCHAR,
  "safety_annotations" VARCHAR,
  "safety_annotation_reasons" VARCHAR,
  "source" VARCHAR,
  "etc" VARCHAR,
  "dialogue_id" BIGINT,
  "response_id" BIGINT,
  "episode_done" BOOLEAN
);