Empathetic Conversational Model Benchmark
Conversation, Prompts, and Tags
By Huggingface Hub [source]
About this dataset
This dataset is a comprehensive collection of conversation models, offering insight and challenge for research on dialogue systems and conversation much further than what was ever thought possible. Split into three sets - training, validation, and test - every set contains conversations with corresponding speaker IDs to form a context, as well as columns for utterance index, prompt/topic of the conversation, self-evaluation of the utterance, and assigned tags. With this deluge of information compiled together in one place it is possible to explore the potentiality of conversation topics further past what we ever thought possible. This dataset has untold possibilities just waiting to be explored!
More Datasets
For more datasets, click here.
Featured Notebooks
- 🚨 Your notebook can be here! 🚨!
How to use the dataset
Getting Started
- Begin by downloading the dataset from Kaggle at https://www.kaggle.com/rakshitshah/empathicconversationalmodelbenchmark
- The downloaded folder should contain three CSV files - train.csv, validation.csv, and test.csv . These contain conversations with corresponding speaker IDs, topics, self-evaluations, and tags that can be used to train conversation models or evaluate their performance
- Each row in each of the three CSV files has 8 columns: index of utterance (utterance_index), context (context), prompt (prompt), utterance (utterance), selfevaluation of utterance (selfevaluation) assigned tags for utterances (tags).
- Utterances are individual statements made by each speaker in the conversation - speakers are identified by ID’s or names included in respective rows under ‘participants’ column
Making Use Of The Dataset
- Use train set to create Machine Learning models that can generate natural conversations based on context, assign empathetic scores to generated conversation responses based on sentiment analysis etc 2) Use validation set to run tests and make sure model is functioning correctly 3) Evaluate models using test set 4) Using ‘tags’ column label different conversations with appropriate tags such as ‘casual chat’ or ‘career advice', make comparison between standard & ML model etc
Research Ideas
- To develop empathetic open-domain conversation models for use in virtual assistants or chatbots, such as sorting conversations by topics and training models to reply accordingly.
- Utilizing the self-evaluation from each utterance as a metric to observe changes in language atmospheres within conversations, such as mood shifts and tonality variations.
- Using the dataset for research purposes that focus on convolutional attention models, LSTMs, seq2seq architectures, Gated Recurrent Units (GRUs), and Transformer Networks to further improve conversation model performance and accuracy
Acknowledgements
If you use this dataset in your research, please credit the original authors.
Data Source
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: validation.csv
Column name |
Description |
context |
The context of the conversation. (String) |
prompt |
The prompt or topic for the conversation. (String) |
utterance |
The utterance or response from a speaker. (String) |
selfeval |
The self-evaluation score assigned to each utterance. (Integer) |
tags |
The associated tags that can be used to categorize or label dialogues. (String) |
File: train.csv
Column name |
Description |
context |
The context of the conversation. (String) |
prompt |
The prompt or topic for the conversation. (String) |
utterance |
The utterance or response from a speaker. (String) |
selfeval |
The self-evaluation score assigned to each utterance. (Integer) |
tags |
The associated tags that can be used to categorize or label dialogues. (String) |
File: test.csv
Column name |
Description |
context |
The context of the conversation. (String) |
prompt |
The prompt or topic for the conversation. (String) |
utterance |
The utterance or response from a speaker. (String) |
selfeval |
The self-evaluation score assigned to each utterance. (Integer) |
tags |
The associated tags that can be used to categorize or label dialogues. (String) |
Acknowledgements
If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.