ViGGO: Video Game Chatbot Dataset
Conversational data-to-text for video game chatbots
By GEM (From Huggingface) [source]
About this dataset
ViGGO is an English data-to-text generation dataset specifically created for the video game domain. It aims to facilitate the development of open-domain chatbots that generate conversational responses in relation to various aspects of video games. The dataset consists of structured meaning representations (MRs) which serve as a representation of different facets within the realm of video games. These MRs cover a wide range of topics and enable chatbots to provide opinions, descriptions, requests for preferences, or inquire about game preferences in their target responses. With its relatively small size, around 5,000 datasets, ViGGO stands out for its cleanliness and suitability for evaluating transfer learning capabilities, low-resource scenarios, or few-shot applications with neural models. By training on this dataset, developers can enhance their chatbot's ability to engage in meaningful conversations related to video games with users
How to use the dataset
How to Use This Dataset: ViGGO - Video Game Chatbot Dataset
Welcome to the ViGGO dataset, a valuable resource for building conversational data-to-text generation models in the exciting domain of video games. In this guide, we will walk you through how to effectively utilize this dataset for your own projects.
Overview
ViGGO is an English data-to-text generation dataset specially curated for open-domain chatbots in the video game domain. Unlike task-oriented dialogue systems, the focus here is on generating natural and engaging responses rather than completing specific tasks. The dataset consists of structured meaning representations (MRs) that represent different aspects of video games and corresponding conversational target responses.
Structure
The dataset is organized into a tabular format with three main columns: meaning_representation
, target
, and additional repetitions of these columns. Each row represents a distinct example or conversation interaction.
-
meaning_representation
: This column contains structured meaning representations (MRs). MRs capture various aspects related to video games such as characters, items, locations, quests, and more. These provide context and information that can be used by chatbot models when generating their responses.
-
target
: This column contains the target responses that your chatbot model should aim to generate during conversations. The target responses are designed to be conversational in nature and may include opinions, descriptions, requests for preferences, or inquiries about game preferences.
Please note that there are multiple duplicates of each column pair (meaning_representation
-target
). These additional repetitions can be utilized for tasks like training models with more data or evaluating model performance on held-out examples.
Utilizing the Dataset
To make use of this dataset effectively:
-
Data Understanding: Familiarize yourself with the provided columns (meaning_representation
and target
) by exploring their contents thoroughly. Understand how different types of game-related concepts are represented in MRs.
-
Preprocessing: Depending on your specific task, you may need to apply preprocessing steps such as tokenization, lowercasing, or removing special characters. Ensure that the data is in a format suitable for training or evaluating your data-to-text generation chatbot.
-
Train-Validation Split: Split the dataset into separate train and validation sets that suit your experimental needs. This will allow you to train and fine-tune your model using the training set and monitor its performance on the validation set.
-
Model Training: Utilize the training set to train your data-to-text generation chatbot model. You can use various techniques
Research Ideas
- Developing chatbots for video game enthusiasts: This dataset can be used to train and develop chatbots that can engage in conversations about video games. The structured meaning representations can help the chatbot understand various aspects of video games, and the target responses can guide the bot in providing conversational and relevant replies.
- Enhancing personalized recommendation systems: The dataset's target responses, which include opinions, descriptions, and requests for preferences about video games, can be utilized to improve personalized recommendation systems for gamers. By analyzing users' preferences expressed in conversations with the chatbot, recommendation algorithms can provide more accurate game suggestions tailored to individual interests.
- Improving natural language understanding models: The dataset's structured meaning representations are valuable resources for training and evaluating natural language understanding (NLU) models. These models aim to comprehend complex user utterances related to video games accurately. Using this dataset can help researchers develop more robust NLU models that better understand user inputs in the context of gaming conversations
Acknowledgements
If you use this dataset in your research, please credit the original authors.
Data Source
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: validation.csv
Column name |
Description |
meaning_representation |
This column contains structured meaning representations (MRs) that represent different aspects related to video games. (Text) |
target |
This column contains conversational text that serves as the target response or output related to each MR. The target responses are more conversational in nature, offering opinions, descriptions, requests for preferences, and inquiries about game preferences. (Text) |
File: challenge_train_1_percent.csv
Column name |
Description |
meaning_representation |
This column contains structured meaning representations (MRs) that represent different aspects related to video games. (Text) |
target |
This column contains conversational text that serves as the target response or output related to each MR. The target responses are more conversational in nature, offering opinions, descriptions, requests for preferences, and inquiries about game preferences. (Text) |
File: train.csv
Column name |
Description |
meaning_representation |
This column contains structured meaning representations (MRs) that represent different aspects related to video games. (Text) |
target |
This column contains conversational text that serves as the target response or output related to each MR. The target responses are more conversational in nature, offering opinions, descriptions, requests for preferences, and inquiries about game preferences. (Text) |
Acknowledgements
If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit GEM (From Huggingface).