Baselight

All GPT-4 Conversations

All chat datasets generated by GPT-4 from Huggingface in the same format

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets

Loading...
Loading...

About this Dataset

All GPT-4 Conversations


All GPT-4 Generated Datasets

Every chat dataset generated by GPT-4 from Huggingface at the same format

From [Huggingface datasets]


About this dataset

How to use the dataset

The dataset includes all chat conversations generated by GPT-4 that are hosted on open Huggingface datasets.
Everything is converted to the same format so the datasets can be easily merged and used for large scale training of LLMs.

Acknowledgements

This dataset is a collection of several single chat datasets.
If you use this dataset in your research, please credit the original authors of the internal datasets.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Tables

Share Gpt Vicuna Unfiltered

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.share_gpt_vicuna_unfiltered
  • 224.77 MB
  • 702151 rows
  • 4 columns
Loading...

CREATE TABLE share_gpt_vicuna_unfiltered (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Wizard Vicuna Dataset Unfiltered

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.wizard_vicuna_dataset_unfiltered
  • 63.78 MB
  • 245212 rows
  • 4 columns
Loading...

CREATE TABLE wizard_vicuna_dataset_unfiltered (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Share link

Anyone who has the link will be able to view this.