ViGGO: Video Game Chatbot Dataset by Kaggle | Technology and IT

About this Dataset

ViGGO: Video Game Chatbot Dataset

Conversational data-to-text for video game chatbots

By GEM (From Huggingface) [source]

About this dataset

ViGGO is an English data-to-text generation dataset specifically created for the video game domain. It aims to facilitate the development of open-domain chatbots that generate conversational responses in relation to various aspects of video games. The dataset consists of structured meaning representations (MRs) which serve as a representation of different facets within the realm of video games. These MRs cover a wide range of topics and enable chatbots to provide opinions, descriptions, requests for preferences, or inquire about game preferences in their target responses. With its relatively small size, around 5,000 datasets, ViGGO stands out for its cleanliness and suitability for evaluating transfer learning capabilities, low-resource scenarios, or few-shot applications with neural models. By training on this dataset, developers can enhance their chatbot's ability to engage in meaningful conversations related to video games with users

How to use the dataset

How to Use This Dataset: ViGGO - Video Game Chatbot Dataset

Welcome to the ViGGO dataset, a valuable resource for building conversational data-to-text generation models in the exciting domain of video games. In this guide, we will walk you through how to effectively utilize this dataset for your own projects.

Overview

ViGGO is an English data-to-text generation dataset specially curated for open-domain chatbots in the video game domain. Unlike task-oriented dialogue systems, the focus here is on generating natural and engaging responses rather than completing specific tasks. The dataset consists of structured meaning representations (MRs) that represent different aspects of video games and corresponding conversational target responses.

Structure

The dataset is organized into a tabular format with three main columns: meaning_representation, target, and additional repetitions of these columns. Each row represents a distinct example or conversation interaction.

meaning_representation: This column contains structured meaning representations (MRs). MRs capture various aspects related to video games such as characters, items, locations, quests, and more. These provide context and information that can be used by chatbot models when generating their responses.

target: This column contains the target responses that your chatbot model should aim to generate during conversations. The target responses are designed to be conversational in nature and may include opinions, descriptions, requests for preferences, or inquiries about game preferences.

Please note that there are multiple duplicates of each column pair (meaning_representation-target). These additional repetitions can be utilized for tasks like training models with more data or evaluating model performance on held-out examples.

Utilizing the Dataset

To make use of this dataset effectively:

Data Understanding: Familiarize yourself with the provided columns (meaning_representation and target) by exploring their contents thoroughly. Understand how different types of game-related concepts are represented in MRs.

Preprocessing: Depending on your specific task, you may need to apply preprocessing steps such as tokenization, lowercasing, or removing special characters. Ensure that the data is in a format suitable for training or evaluating your data-to-text generation chatbot.

Train-Validation Split: Split the dataset into separate train and validation sets that suit your experimental needs. This will allow you to train and fine-tune your model using the training set and monitor its performance on the validation set.

Model Training: Utilize the training set to train your data-to-text generation chatbot model. You can use various techniques

Research Ideas

Developing chatbots for video game enthusiasts: This dataset can be used to train and develop chatbots that can engage in conversations about video games. The structured meaning representations can help the chatbot understand various aspects of video games, and the target responses can guide the bot in providing conversational and relevant replies.

Enhancing personalized recommendation systems: The dataset's target responses, which include opinions, descriptions, and requests for preferences about video games, can be utilized to improve personalized recommendation systems for gamers. By analyzing users' preferences expressed in conversations with the chatbot, recommendation algorithms can provide more accurate game suggestions tailored to individual interests.

Improving natural language understanding models: The dataset's structured meaning representations are valuable resources for training and evaluating natural language understanding (NLU) models. These models aim to comprehend complex user utterances related to video games accurately. Using this dataset can help researchers develop more robust NLU models that better understand user inputs in the context of gaming conversations

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: validation.csv

Column name	Description
meaning_representation	This column contains structured meaning representations (MRs) that represent different aspects related to video games. (Text)
target	This column contains conversational text that serves as the target response or output related to each MR. The target responses are more conversational in nature, offering opinions, descriptions, requests for preferences, and inquiries about game preferences. (Text)

File: challenge_train_1_percent.csv

Column name	Description
meaning_representation	This column contains structured meaning representations (MRs) that represent different aspects related to video games. (Text)
target	This column contains conversational text that serves as the target response or output related to each MR. The target responses are more conversational in nature, offering opinions, descriptions, requests for preferences, and inquiries about game preferences. (Text)

File: train.csv

Column name	Description
meaning_representation	This column contains structured meaning representations (MRs) that represent different aspects related to video games. (Text)
target	This column contains conversational text that serves as the target response or output related to each MR. The target responses are more conversational in nature, offering opinions, descriptions, requests for preferences, and inquiries about game preferences. (Text)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit GEM (From Huggingface).

Tables

Challenge Train 10 Percent

@kaggle.thedevastator_viggo_video_game_chatbot_dataset.challenge_train_10_percent

75.47 KB
510 rows
4 columns


CREATE TABLE challenge_train_10_percent (
  "gem_id" VARCHAR,
  "meaning_representation" VARCHAR,
  "target" VARCHAR,
  "references" VARCHAR
);

Challenge Train 1 Percent

@kaggle.thedevastator_viggo_video_game_chatbot_dataset.challenge_train_1_percent

14.86 KB
50 rows
4 columns


CREATE TABLE challenge_train_1_percent (
  "gem_id" VARCHAR,
  "meaning_representation" VARCHAR,
  "target" VARCHAR,
  "references" VARCHAR
);

Challenge Train 20 Percent

@kaggle.thedevastator_viggo_video_game_chatbot_dataset.challenge_train_20_percent

146.47 KB
1021 rows
4 columns


CREATE TABLE challenge_train_20_percent (
  "gem_id" VARCHAR,
  "meaning_representation" VARCHAR,
  "target" VARCHAR,
  "references" VARCHAR
);

Challenge Train 2 Percent

@kaggle.thedevastator_viggo_video_game_chatbot_dataset.challenge_train_2_percent

23.41 KB
103 rows
4 columns


CREATE TABLE challenge_train_2_percent (
  "gem_id" VARCHAR,
  "meaning_representation" VARCHAR,
  "target" VARCHAR,
  "references" VARCHAR
);

Challenge Train 5 Percent

@kaggle.thedevastator_viggo_video_game_chatbot_dataset.challenge_train_5_percent

46.06 KB
256 rows
4 columns


CREATE TABLE challenge_train_5_percent (
  "gem_id" VARCHAR,
  "meaning_representation" VARCHAR,
  "target" VARCHAR,
  "references" VARCHAR
);

Test

@kaggle.thedevastator_viggo_video_game_chatbot_dataset.test

133.47 KB
1083 rows
4 columns


CREATE TABLE test (
  "gem_id" VARCHAR,
  "meaning_representation" VARCHAR,
  "target" VARCHAR,
  "references" VARCHAR
);

Train

@kaggle.thedevastator_viggo_video_game_chatbot_dataset.train

633.97 KB
5103 rows
4 columns


CREATE TABLE train (
  "gem_id" VARCHAR,
  "meaning_representation" VARCHAR,
  "target" VARCHAR,
  "references" VARCHAR
);

Validation

@kaggle.thedevastator_viggo_video_game_chatbot_dataset.validation

93.9 KB
714 rows
4 columns


CREATE TABLE validation (
  "gem_id" VARCHAR,
  "meaning_representation" VARCHAR,
  "target" VARCHAR,
  "references" VARCHAR
);