Baselight

All GPT-4 Conversations

All chat datasets generated by GPT-4 from Huggingface in the same format

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets

Loading...
Loading...

About this Dataset

All GPT-4 Conversations


All GPT-4 Generated Datasets

Every chat dataset generated by GPT-4 from Huggingface at the same format

From [Huggingface datasets]


About this dataset

How to use the dataset

The dataset includes all chat conversations generated by GPT-4 that are hosted on open Huggingface datasets.
Everything is converted to the same format so the datasets can be easily merged and used for large scale training of LLMs.

Acknowledgements

This dataset is a collection of several single chat datasets.
If you use this dataset in your research, please credit the original authors of the internal datasets.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Tables

Alpaca Data Cleaned

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.alpaca_data_cleaned
  • 23.76 MB
  • 122,650 rows
  • 4 columns
Loading...
CREATE TABLE alpaca_data_cleaned (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Code Alpaca Data

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.code_alpaca_data
  • 3.39 MB
  • 39,972 rows
  • 4 columns
Loading...
CREATE TABLE code_alpaca_data (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Conala Mined

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.conala_mined
  • 54.95 MB
  • 1,187,782 rows
  • 4 columns
Loading...
CREATE TABLE conala_mined (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Conala Paired Test

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.conala_paired_test
  • 38.57 kB
  • 1,000 rows
  • 4 columns
Loading...
CREATE TABLE conala_paired_test (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Conala Paired Train

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.conala_paired_train
  • 170.26 kB
  • 4,758 rows
  • 4 columns
Loading...
CREATE TABLE conala_paired_train (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Glaive Function Calling

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.glaive_function_calling
  • 37.74 MB
  • 379,782 rows
  • 4 columns
Loading...
CREATE TABLE glaive_function_calling (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Goat

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.goat
  • 164.72 MB
  • 5,238,900 rows
  • 4 columns
Loading...
CREATE TABLE goat (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Gorilla 16k

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.gorilla_16k
  • 5.57 MB
  • 32,502 rows
  • 4 columns
Loading...
CREATE TABLE gorilla_16k (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Gsm8k Main Test

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.gsm8k_main_test
  • 402.61 kB
  • 2,638 rows
  • 4 columns
Loading...
CREATE TABLE gsm8k_main_test (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Gsm8k Main Train

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.gsm8k_main_train
  • 2.2 MB
  • 14,946 rows
  • 4 columns
Loading...
CREATE TABLE gsm8k_main_train (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Gsm8k Socratic Test

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.gsm8k_socratic_test
  • 460.84 kB
  • 2,638 rows
  • 4 columns
Loading...
CREATE TABLE gsm8k_socratic_test (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Gsm8k Socratic Train

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.gsm8k_socratic_train
  • 2.52 MB
  • 14,946 rows
  • 4 columns
Loading...
CREATE TABLE gsm8k_socratic_train (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Lima Test

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.lima_test
  • 31.04 kB
  • 300 rows
  • 4 columns
Loading...
CREATE TABLE lima_test (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Lima Train

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.lima_train
  • 1.68 MB
  • 2,169 rows
  • 4 columns
Loading...
CREATE TABLE lima_train (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Med Alpaca Data

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.med_alpaca_data
  • 771.84 MB
  • 2,694,112 rows
  • 4 columns
Loading...
CREATE TABLE med_alpaca_data (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Puffin

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.puffin
  • 6.7 MB
  • 13,988 rows
  • 4 columns
Loading...
CREATE TABLE puffin (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Riddle Sense Test

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.riddle_sense_test
  • 128.16 kB
  • 2,368 rows
  • 4 columns
Loading...
CREATE TABLE riddle_sense_test (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Riddle Sense Train

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.riddle_sense_train
  • 430.82 kB
  • 7,020 rows
  • 4 columns
Loading...
CREATE TABLE riddle_sense_train (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Riddle Sense Validation

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.riddle_sense_validation
  • 123.63 kB
  • 2,042 rows
  • 4 columns
Loading...
CREATE TABLE riddle_sense_validation (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Science Qa Txt Only Test

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.science_qa_txt_only_test
  • 1.05 MB
  • 4,448 rows
  • 4 columns
Loading...
CREATE TABLE science_qa_txt_only_test (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Science Qa Txt Only Train

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.science_qa_txt_only_train
  • 3.03 MB
  • 13,016 rows
  • 4 columns
Loading...
CREATE TABLE science_qa_txt_only_train (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Science Qa Txt Only Validation

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.science_qa_txt_only_validation
  • 974.62 kB
  • 4,288 rows
  • 4 columns
Loading...
CREATE TABLE science_qa_txt_only_validation (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Sciq Test

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.sciq_test
  • 328.58 kB
  • 2,000 rows
  • 4 columns
Loading...
CREATE TABLE sciq_test (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Sciq Train

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.sciq_train
  • 3.76 MB
  • 23,358 rows
  • 4 columns
Loading...
CREATE TABLE sciq_train (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Sciq Validation

@kaggle.thedevastator_all_gpt_4_synthetic_chat_datasets.sciq_validation
  • 324.64 kB
  • 2,000 rows
  • 4 columns
Loading...
CREATE TABLE sciq_validation (
  "message" VARCHAR,
  "message_type" VARCHAR,
  "message_id" BIGINT,
  "conversation_id" BIGINT
);

Share link

Anyone who has the link will be able to view this.