Empathetic Conversational Model Benchmark by Kaggle | Other

About this Dataset

Empathetic Conversational Model Benchmark

Conversation, Prompts, and Tags

By Huggingface Hub [source]

About this dataset

This dataset is a comprehensive collection of conversation models, offering insight and challenge for research on dialogue systems and conversation much further than what was ever thought possible. Split into three sets - training, validation, and test - every set contains conversations with corresponding speaker IDs to form a context, as well as columns for utterance index, prompt/topic of the conversation, self-evaluation of the utterance, and assigned tags. With this deluge of information compiled together in one place it is possible to explore the potentiality of conversation topics further past what we ever thought possible. This dataset has untold possibilities just waiting to be explored!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

Getting Started

Begin by downloading the dataset from Kaggle at https://www.kaggle.com/rakshitshah/empathicconversationalmodelbenchmark

The downloaded folder should contain three CSV files - train.csv, validation.csv, and test.csv . These contain conversations with corresponding speaker IDs, topics, self-evaluations, and tags that can be used to train conversation models or evaluate their performance

Each row in each of the three CSV files has 8 columns: index of utterance (utterance_index), context (context), prompt (prompt), utterance (utterance), selfevaluation of utterance (selfevaluation) assigned tags for utterances (tags).

Utterances are individual statements made by each speaker in the conversation - speakers are identified by ID’s or names included in respective rows under ‘participants’ column

Making Use Of The Dataset

Use train set to create Machine Learning models that can generate natural conversations based on context, assign empathetic scores to generated conversation responses based on sentiment analysis etc 2) Use validation set to run tests and make sure model is functioning correctly 3) Evaluate models using test set 4) Using ‘tags’ column label different conversations with appropriate tags such as ‘casual chat’ or ‘career advice', make comparison between standard & ML model etc

Research Ideas

To develop empathetic open-domain conversation models for use in virtual assistants or chatbots, such as sorting conversations by topics and training models to reply accordingly.

Utilizing the self-evaluation from each utterance as a metric to observe changes in language atmospheres within conversations, such as mood shifts and tonality variations.

Using the dataset for research purposes that focus on convolutional attention models, LSTMs, seq2seq architectures, Gated Recurrent Units (GRUs), and Transformer Networks to further improve conversation model performance and accuracy

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: validation.csv

Column name	Description
context	The context of the conversation. (String)
prompt	The prompt or topic for the conversation. (String)
utterance	The utterance or response from a speaker. (String)
selfeval	The self-evaluation score assigned to each utterance. (Integer)
tags	The associated tags that can be used to categorize or label dialogues. (String)

File: train.csv

Column name	Description
context	The context of the conversation. (String)
prompt	The prompt or topic for the conversation. (String)
utterance	The utterance or response from a speaker. (String)
selfeval	The self-evaluation score assigned to each utterance. (Integer)
tags	The associated tags that can be used to categorize or label dialogues. (String)

File: test.csv

Column name	Description
context	The context of the conversation. (String)
prompt	The prompt or topic for the conversation. (String)
utterance	The utterance or response from a speaker. (String)
selfeval	The self-evaluation score assigned to each utterance. (Integer)
tags	The associated tags that can be used to categorize or label dialogues. (String)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.

Tables

Test

@kaggle.thedevastator_empathetic_conversational_model_benchmark.test

886.76 KB
10943 rows
8 columns


CREATE TABLE test (
  "conv_id" VARCHAR,
  "utterance_idx" BIGINT,
  "context" VARCHAR,
  "prompt" VARCHAR,
  "speaker_idx" BIGINT,
  "utterance" VARCHAR,
  "selfeval" VARCHAR,
  "tags" VARCHAR
);

Train

@kaggle.thedevastator_empathetic_conversational_model_benchmark.train

5.36 MB
76673 rows
8 columns


CREATE TABLE train (
  "conv_id" VARCHAR,
  "utterance_idx" BIGINT,
  "context" VARCHAR,
  "prompt" VARCHAR,
  "speaker_idx" BIGINT,
  "utterance" VARCHAR,
  "selfeval" VARCHAR,
  "tags" VARCHAR
);

Validation

@kaggle.thedevastator_empathetic_conversational_model_benchmark.validation

928.43 KB
12030 rows
8 columns


CREATE TABLE validation (
  "conv_id" VARCHAR,
  "utterance_idx" BIGINT,
  "context" VARCHAR,
  "prompt" VARCHAR,
  "speaker_idx" BIGINT,
  "utterance" VARCHAR,
  "selfeval" VARCHAR,
  "tags" VARCHAR
);