Baselight
Sign In
kaggle

Kaggle

Data Source

@kaggle

Kaggle hosts community and competition datasets across machine learning, research, public analytics, benchmarks, notebooks, metadata, and structured data projects.

Datasets

Total public datasets added

8,820

Rows

Total rows contributed

5,590,595,304

Popularity

Total times datasets used in queries

316

Stars

Total stars received

38

Tomato Gene Expression Data

Non-Organic Imprints

Healthcare
1 year ago
1
37.5 MB
0

HTTP Header Fields Dataset

How information is encoded and sent/received on the internet

Technology and IT
1 year ago
5
47.15 kB
0

Wikipedia Molecules Properties Dataset

Molecular Properties Dataset from Wikipedia

Other
1 year ago
1
1.83 MB
0

LAMBADA Word Prediction

Evaluating text understanding through word prediction

Other
1 year ago
3
552.45 MB
0

Question-Answering Training And Testing Data

A dataset for training and testing question-answering models

Other
1 year ago
2
83.38 MB
0

LLM Feedback Collection

Induce fine-grained evaluation capabilities into language models

Technology and IT
1 year ago
1
459.52 MB
0

UltraChat 200K

200K Dialogues of Diverse Topics for NLG Research

Academic Research
1 year ago
4
1.63 GB
0

Orca DPO Dialogue Pairs

Orca style for preference training (Intel's DPO dataset)

Other
1 year ago
1
18.88 MB
0

OpenHermes

GPT-4 AI Dataset - 242K Entries

Technology and IT
1 year ago
1
141.81 MB
0

QSAR Molecular Descriptor Predictions

Analyzing Activation Energy in Chemical Compounds

Environmental and Climate Sciences
1 year ago
3
260.56 kB
0

PAWS (Paraphrase Word Scrambling)

A dataset for modeling structure, context, and word order information

Other
1 year ago
6
124.19 MB
0

TinyShakespeare (Shakespeare's Plays)

40,000 lines of Shakespeare from a variety of Shakespeare's plays

Other
1 year ago
2
75.81 kB
0

Reddit: /r/EatCheapAndHealthy

Cost-Effective Nutritional Solutions from the Community

Finance and Economics
1 year ago
1
526.31 kB
0

Lovoo V3 Dating App User Profiles And Statistics

Revealing popular user traits and behavior

Media and Entertainment
1 year ago
3
846.63 kB
0

Crypto, Web3 And Blockchain Jobs

Scraped active crypto jobs listed on cryptojobslist.com

Crypto and Blockchain
1 year ago
110
1.11 MB
0

NFT Top Collections (Timeseries)

Historical data of the top NFT collections

Crypto and Blockchain
1 year ago
2
201.03 kB
0

220k-GPT4Vision Image Captions

220k-GPT4Vision Image Captions

Other
1 year ago
1
44.1 MB
0

RSICD Image Caption Dataset

RSICD Image Caption Dataset

Other
1 year ago
3
1.04 GB
0

Psychedelic Drug Database

Psychotropic and psychedelics drugs database with molecular descriptors

Healthcare
1 year ago
1
237.57 kB
0

Amod Mental Health Counseling Conversations

A dataset of mental health counseling conversations for training models

Healthcare
1 year ago
1
2.3 MB
0

Logical Reasoning Improvement Dataset

Enhancing LLM Logical Reasoning Skills with Platypus2 Models

Technology and IT
1 year ago
1
15.33 MB
0

Glaive Function Calling V2

A Knowledge Base for Trainable Natural Language Processing

Other
1 year ago
1
97.04 MB
0

Alpaca

Alpaca - Training LLMs to follow instructions

Other
1 year ago
1
70.75 MB
0

Tulu V2 Dataset

Assisting Assistive Tasks with Language Data Mixtures

Other
1 year ago
1
561.5 MB
0

Know Saraswati COT

Open Source Logical Reasoning Dataset

Other
1 year ago
1
72.17 MB
0

SlimOrca

OpenOrca (Reproduction of Orca) - Cleverly Sampled

Other
1 year ago
1
484.27 MB
0

Autonomous Transport User Experiences

Rating Vehicle and User Interface Performance in Luxembourg Pilots

Transportation and Logistics
1 year ago
2
87.94 kB
0

Airbnb Listings In Boston

Location, Ratings, and Prices

Finance and Economics
1 year ago
1
100.07 kB
0

Fertilizer Use And Price

1960-2012 data on fertilizer consumption in the United States by plant nutrient

Finance and Economics
1 year ago
1
264.11 kB
0

Predicting The Weather In Indonesia

5 Years of Historical Data

Environmental and Climate Sciences
1 year ago
1
104.95 kB
0

TIMDB - Bollywood Films

A Data-Driven Approach to Bollywood

Other
1 year ago
14
4.3 MB
0

Craigslist Gigs (Boston)

Gigs collected from craigslist (boston)

Other
1 year ago
2
127.52 kB
0

Chemistry Problem-Solution

Chemistry Problem-Solution Dataset: 20K pairs across 25 topics and subtopics

Other
1 year ago
1
16.53 MB
0

Openerotica/basilisk-v0.2 Conversations Dataset

Annotated Conversations from openerotica and freedom-rp

Other
1 year ago
1
371.62 MB
0

GPT Roleplay Realm: Enhanced Character

Character Cards and Dialogues for immersive role-playing experiences

Other
1 year ago
2
801.17 MB
0

The Pile Small

A dataset for pretraining general models

Other
1 year ago
1
328.77 MB
0

Mintaka By AmazonScience (Multilingual Q&A)

8 Language Variations with Complex Question Types

Other
1 year ago
3
2.34 MB
0

LongAlpaca-Yukang ML Instructional Outputs

Unlocking the Power of AI

Technology and IT
1 year ago
1
265.85 MB
0

Objaverse-XL: 10M+ 3D Objects, Zero123-XL

For Training AI-Powered 3D Rendering

Technology and IT
1 year ago
1
1.38 GB
0

Synthia-v1.3

Orca-style dataset for following directions and conducting in-depth discussions

Other
1 year ago
1
128.27 MB
0

Air Pollution And Mental Health

Identifying Short-Term Human Impacts of Air Pollution

Healthcare
1 year ago
1
646.57 kB
0

Regional Water Temperatures Over Time

Historical Records of Berlin, Brandenburg and Altmark Lakes

Environmental and Climate Sciences
1 year ago
1
5.29 kB
0

Predicting Portuguese Bank Term Deposit

Identifying Likely Customers for Conversion Optimization

Finance and Economics
1 year ago
2
423.05 kB
0

Smithsonian Butterfly Dataset

Butterfly images and information from the Smithsonian Institution

Other
1 year ago
1
483.38 MB
0

GSM8K - Grade School Math 8K Q&A

A Linguistically Diverse Dataset for Multi-Step Reasoning Question Answering

Demographics and Population Studies
1 year ago
4
5.81 MB
0

MetaMath QA

Mathematical Questions for Large Language Models

Other
1 year ago
1
138.79 MB
0

HelpSteer: AI Alignment Dataset

Real-World Helpfulness Annotated for AI Alignment

Technology and IT
1 year ago
2
30.85 MB
0

Women's Crimes In India

Characteristics, Frequency, and Motives

Demographics and Population Studies
1 year ago
76
5.17 MB
0

Mental Health Chatbot Pairs

AI-based Tailored Support for Mental Health Conversation

Healthcare
1 year ago
1
103.88 kB
0

General Language Understanding Evaluation (GLUE)

The Famous General Language Understanding Evaluation benchmark

Other
1 year ago
34
151.72 MB
0

Share link

Anyone who has the link will be able to view this.