Baselight

All GPT-4 Conversations

All chat datasets generated by GPT-4 from Huggingface in the same format

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets

Loading...
Loading...

About this Dataset

All GPT-4 Conversations


All GPT-4 Generated Datasets

Every chat dataset generated by GPT-4 from Huggingface at the same format

From [Huggingface datasets]


About this dataset

How to use the dataset

The dataset includes all chat conversations generated by GPT-4 that are hosted on open Huggingface datasets.
Everything is converted to the same format so the datasets can be easily merged and used for large scale training of LLMs.

Acknowledgements

This dataset is a collection of several single chat datasets.
If you use this dataset in your research, please credit the original authors of the internal datasets.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Tables

Alpaca Data Cleaned

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.alpaca_data_cleaned
  • 22.66 MB
  • 122650 rows
  • 4 columns
Loading...

CREATE TABLE alpaca_data_cleaned (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Code Alpaca Data

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.code_alpaca_data
  • 3.23 MB
  • 39972 rows
  • 4 columns
Loading...

CREATE TABLE code_alpaca_data (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Conala Mined

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.conala_mined
  • 52.4 MB
  • 1187782 rows
  • 4 columns
Loading...

CREATE TABLE conala_mined (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Conala Paired Test

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.conala_paired_test
  • 37.67 KB
  • 1000 rows
  • 4 columns
Loading...

CREATE TABLE conala_paired_test (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Conala Paired Train

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.conala_paired_train
  • 166.27 KB
  • 4758 rows
  • 4 columns
Loading...

CREATE TABLE conala_paired_train (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Glaive Function Calling

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.glaive_function_calling
  • 35.99 MB
  • 379782 rows
  • 4 columns
Loading...

CREATE TABLE glaive_function_calling (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Goat

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.goat
  • 157.09 MB
  • 5238900 rows
  • 4 columns
Loading...

CREATE TABLE goat (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Gorilla 16k

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.gorilla_16k
  • 5.31 MB
  • 32502 rows
  • 4 columns
Loading...

CREATE TABLE gorilla_16k (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Gsm8k Main Test

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.gsm8k_main_test
  • 393.17 KB
  • 2638 rows
  • 4 columns
Loading...

CREATE TABLE gsm8k_main_test (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Gsm8k Main Train

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.gsm8k_main_train
  • 2.1 MB
  • 14946 rows
  • 4 columns
Loading...

CREATE TABLE gsm8k_main_train (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Gsm8k Socratic Test

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.gsm8k_socratic_test
  • 450.04 KB
  • 2638 rows
  • 4 columns
Loading...

CREATE TABLE gsm8k_socratic_test (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Gsm8k Socratic Train

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.gsm8k_socratic_train
  • 2.4 MB
  • 14946 rows
  • 4 columns
Loading...

CREATE TABLE gsm8k_socratic_train (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Lima Test

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.lima_test
  • 30.31 KB
  • 300 rows
  • 4 columns
Loading...

CREATE TABLE lima_test (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Lima Train

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.lima_train
  • 1.61 MB
  • 2169 rows
  • 4 columns
Loading...

CREATE TABLE lima_train (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Med Alpaca Data

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.med_alpaca_data
  • 736.09 MB
  • 2694112 rows
  • 4 columns
Loading...

CREATE TABLE med_alpaca_data (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Puffin

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.puffin
  • 6.39 MB
  • 13988 rows
  • 4 columns
Loading...

CREATE TABLE puffin (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Riddle Sense Test

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.riddle_sense_test
  • 125.16 KB
  • 2368 rows
  • 4 columns
Loading...

CREATE TABLE riddle_sense_test (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Riddle Sense Train

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.riddle_sense_train
  • 420.73 KB
  • 7020 rows
  • 4 columns
Loading...

CREATE TABLE riddle_sense_train (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Riddle Sense Validation

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.riddle_sense_validation
  • 120.73 KB
  • 2042 rows
  • 4 columns
Loading...

CREATE TABLE riddle_sense_validation (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Science Qa Txt Only Test

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.science_qa_txt_only_test
  • 1 MB
  • 4448 rows
  • 4 columns
Loading...

CREATE TABLE science_qa_txt_only_test (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Science Qa Txt Only Train

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.science_qa_txt_only_train
  • 2.89 MB
  • 13016 rows
  • 4 columns
Loading...

CREATE TABLE science_qa_txt_only_train (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Science Qa Txt Only Validation

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.science_qa_txt_only_validation
  • 951.78 KB
  • 4288 rows
  • 4 columns
Loading...

CREATE TABLE science_qa_txt_only_validation (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Sciq Test

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.sciq_test
  • 320.88 KB
  • 2000 rows
  • 4 columns
Loading...

CREATE TABLE sciq_test (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Sciq Train

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.sciq_train
  • 3.58 MB
  • 23358 rows
  • 4 columns
Loading...

CREATE TABLE sciq_train (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Sciq Validation

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.sciq_validation
  • 317.03 KB
  • 2000 rows
  • 4 columns
Loading...

CREATE TABLE sciq_validation (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Share Gpt Vicuna Unfiltered

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.share_gpt_vicuna_unfiltered
  • 224.77 MB
  • 702151 rows
  • 4 columns
Loading...

CREATE TABLE share_gpt_vicuna_unfiltered (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Wizard Vicuna Dataset Unfiltered

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.wizard_vicuna_dataset_unfiltered
  • 63.78 MB
  • 245212 rows
  • 4 columns
Loading...

CREATE TABLE wizard_vicuna_dataset_unfiltered (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Share link

Anyone who has the link will be able to view this.