Baselight

Empathetic Conversational Model Benchmark

Conversation, Prompts, and Tags

@kaggle.thedevastator_empathetic_conversational_model_benchmark

Loading...
Loading...

About this Dataset

Empathetic Conversational Model Benchmark


Empathetic Conversational Model Benchmark

Conversation, Prompts, and Tags

By Huggingface Hub [source]


About this dataset

This dataset is a comprehensive collection of conversation models, offering insight and challenge for research on dialogue systems and conversation much further than what was ever thought possible. Split into three sets - training, validation, and test - every set contains conversations with corresponding speaker IDs to form a context, as well as columns for utterance index, prompt/topic of the conversation, self-evaluation of the utterance, and assigned tags. With this deluge of information compiled together in one place it is possible to explore the potentiality of conversation topics further past what we ever thought possible. This dataset has untold possibilities just waiting to be explored!

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

Getting Started

  • Begin by downloading the dataset from Kaggle at https://www.kaggle.com/rakshitshah/empathicconversationalmodelbenchmark
  • The downloaded folder should contain three CSV files - train.csv, validation.csv, and test.csv . These contain conversations with corresponding speaker IDs, topics, self-evaluations, and tags that can be used to train conversation models or evaluate their performance
  • Each row in each of the three CSV files has 8 columns: index of utterance (utterance_index), context (context), prompt (prompt), utterance (utterance), selfevaluation of utterance (selfevaluation) assigned tags for utterances (tags).
  • Utterances are individual statements made by each speaker in the conversation - speakers are identified by ID’s or names included in respective rows under ‘participants’ column

Making Use Of The Dataset

  • Use train set to create Machine Learning models that can generate natural conversations based on context, assign empathetic scores to generated conversation responses based on sentiment analysis etc 2) Use validation set to run tests and make sure model is functioning correctly 3) Evaluate models using test set 4) Using ‘tags’ column label different conversations with appropriate tags such as ‘casual chat’ or ‘career advice', make comparison between standard & ML model etc

Research Ideas

  • To develop empathetic open-domain conversation models for use in virtual assistants or chatbots, such as sorting conversations by topics and training models to reply accordingly.
  • Utilizing the self-evaluation from each utterance as a metric to observe changes in language atmospheres within conversations, such as mood shifts and tonality variations.
  • Using the dataset for research purposes that focus on convolutional attention models, LSTMs, seq2seq architectures, Gated Recurrent Units (GRUs), and Transformer Networks to further improve conversation model performance and accuracy

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: validation.csv

Column name Description
context The context of the conversation. (String)
prompt The prompt or topic for the conversation. (String)
utterance The utterance or response from a speaker. (String)
selfeval The self-evaluation score assigned to each utterance. (Integer)
tags The associated tags that can be used to categorize or label dialogues. (String)

File: train.csv

Column name Description
context The context of the conversation. (String)
prompt The prompt or topic for the conversation. (String)
utterance The utterance or response from a speaker. (String)
selfeval The self-evaluation score assigned to each utterance. (Integer)
tags The associated tags that can be used to categorize or label dialogues. (String)

File: test.csv

Column name Description
context The context of the conversation. (String)
prompt The prompt or topic for the conversation. (String)
utterance The utterance or response from a speaker. (String)
selfeval The self-evaluation score assigned to each utterance. (Integer)
tags The associated tags that can be used to categorize or label dialogues. (String)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.

Tables

Test

@kaggle.thedevastator_empathetic_conversational_model_benchmark.test
  • 886.76 KB
  • 10943 rows
  • 8 columns
Loading...

CREATE TABLE test (
  "conv_id" VARCHAR,
  "utterance_idx" BIGINT,
  "context" VARCHAR,
  "prompt" VARCHAR,
  "speaker_idx" BIGINT,
  "utterance" VARCHAR,
  "selfeval" VARCHAR,
  "tags" VARCHAR
);

Train

@kaggle.thedevastator_empathetic_conversational_model_benchmark.train
  • 5.36 MB
  • 76673 rows
  • 8 columns
Loading...

CREATE TABLE train (
  "conv_id" VARCHAR,
  "utterance_idx" BIGINT,
  "context" VARCHAR,
  "prompt" VARCHAR,
  "speaker_idx" BIGINT,
  "utterance" VARCHAR,
  "selfeval" VARCHAR,
  "tags" VARCHAR
);

Validation

@kaggle.thedevastator_empathetic_conversational_model_benchmark.validation
  • 928.43 KB
  • 12030 rows
  • 8 columns
Loading...

CREATE TABLE validation (
  "conv_id" VARCHAR,
  "utterance_idx" BIGINT,
  "context" VARCHAR,
  "prompt" VARCHAR,
  "speaker_idx" BIGINT,
  "utterance" VARCHAR,
  "selfeval" VARCHAR,
  "tags" VARCHAR
);

Share link

Anyone who has the link will be able to view this.