Baselight
Sign In
kaggle

Kaggle

Data Source

@kaggle

Kaggle hosts community and competition datasets across machine learning, research, public analytics, benchmarks, notebooks, metadata, and structured data projects.

Datasets

Total public datasets added

8,820

Rows

Total rows contributed

5,590,595,304

Popularity

Total times datasets used in queries

316

Stars

Total stars received

38

Accurate Medical Translation Data

Accurate Medical Translation Dataset

Healthcare
1 year ago
1
2.45 MB
0

Textual Entailment Dataset

Textual Entailment Dataset with Labelled Text Pairs

Other
1 year ago
3
51.84 MB
0

WinoBias Coreference Dataset

Gender-biased coreference dataset focused on occupation stereotypes in WinoBias

Demographics and Population Studies
1 year ago
8
271.58 kB
0

WikiANN

Multilingual named entity recognition for LLM training

Technology and IT
1 year ago
528
137.22 MB
0

MLQA - Multilingual Question-Answering

Multilingual Question-Answering Dataset

Other
1 year ago
116
259.57 MB
0

HAREM Portuguese NER Corpus

Portuguese NER Corpus with 10 Classes

Other
1 year ago
3
442.56 kB
0

DBpedia Ontology

Text Classification Dataset with 14 Classes

Technology and IT
1 year ago
2
116 MB
0

Mind2Web: Generalist Agents For Web Tasks

Language-guided Generalist Agents for Web Tasks

Other
1 year ago
1
814.5 MB
0

CAMEL AI: Biology Problems / Solutions

Biology Problem-Solution Pairs for Synthetic Biology

Technology and IT
1 year ago
1
21.86 MB
0

MathInstruct Dataset: Hybrid Math Instruction

A curated dataset for math instruction tuning models

Technology and IT
1 year ago
1
97.66 MB
0

TokenBender: Alpaca Code Generation Instructions

Generating Alpaca-style code from natural language instructions

Other
1 year ago
1
70.75 MB
0

Knowledge Symbolic Correlation With LLMs

Building a Bridge Between Prompts and Knowledge for Large Language Models

Other
1 year ago
1
130.15 kB
0

Self-instruct Starcoder

Instruct dataset generated from starcoder

Other
1 year ago
4
10.83 MB
0

Ultrafeedback Binarized

Predicting Binary Preferences with SFT, PPO and DPO

Other
1 year ago
6
644.14 MB
0

Empathetic Conversational Model Benchmark

Conversation, Prompts, and Tags

Other
1 year ago
3
7.48 MB
0

Facebook Posts Of Amazon Tourism

Analyzing Consumer Engagement and Content Trends

Ecommerce and Consumer Trends
1 year ago
1
225.48 kB
0

Customer Purchasing Patterns With Market Basket

Identifying Key Associations

Finance and Economics
1 year ago
2
242.6 kB
0

Museo Del Prado Artworks

Pre-1489 Techniques, Dimensions and Origins

Other
1 year ago
1
625.25 kB
0

CommonsenseQA (Multiple-Choice Q&A)

12,102 questions with one correct answer and four distractor answers

Other
1 year ago
3
1.19 MB
0

Newsgroups (Text Classification)

Comprehensive Collection of Text Classification Datasets

Technology and IT
1 year ago
77
64.61 MB
0

Quoref (Q&A For Coreference Resolution)

Resolving Coreferences to Answer Questions

Other
1 year ago
2
9.97 MB
0

Rotten Tomatoes Movie Reviews

Predicting Movie Review Sentiment

Ecommerce and Consumer Trends
1 year ago
3
871.27 kB
0

Mobile Phone Carriers By Country

A Dataset of mobile phone carriers

Ecommerce and Consumer Trends
1 year ago
3
17.18 kB
0

SciTail (Multiple-choice Science Exams)

27,026 Multiple-choice science exams and web sentences

Other
1 year ago
12
12.76 MB
0

Marine Institute Buoy Wave Forecast

Significant Wave Heights, Mean Wave Periods, and Wave Power Data

Environmental and Climate Sciences
1 year ago
1
215.36 kB
0

Comparative Analysis Of Airbnb Prices In Barcelona

The Cheapest and Most Expensive Accommodations

Finance and Economics
1 year ago
1
21.98 kB
0

Vibrio Vulnificus Abundance In Ala Wai Canal

Exploring Temporal, Spatial and Environmental Influences

Environmental and Climate Sciences
1 year ago
1
45.46 kB
0

Costs Of Planting Low-Carbon Ecosystems In China

Carbon Sequestration and Investment Potential

Finance and Economics
1 year ago
1
5.64 kB
0

EV Driver Trips In London

Charging Bundle Optimization for EV Adoption

Other
1 year ago
3
319.55 kB
0

Scotland's Health, Housing And Crime Statistics

Exploring Multifaceted Issues with Machine Learning

Healthcare
1 year ago
1
715.14 kB
0

Effects Of On-Farm Hatching On Layer Chicks

Stress, Cognitive Ability, and Weight Gain

Healthcare
1 year ago
31
253.55 kB
0

Geographic Patterns Of NYPD Arrests

Exploring Arrest Locations and Contributing Factors

Other
1 year ago
1
2.54 kB
0

College Football 2022 (Wins, Losses, Rankings)

Team Performance and Game Results

Sports
1 year ago
5
183.12 kB
0

Analysis Of Spanish Apartment Pricing And Size

Investigating the Impact of the Pandemic

Finance and Economics
1 year ago
1
84.29 MB
0

Canadian Baseball

Examining Player Performance by Division, State and School

Sports
1 year ago
13
515.08 kB
0

Drug Indication Data (FAERS)

Drug indications extracted from the FDA Adverse Event Reporting System (FAERS)

Healthcare
1 year ago
1
119.78 kB
0

Altria Financial Ratios Overview

Analyzing Long-Term Stock Performance, Pricing Power, and Capital Allocation

Finance and Economics
1 year ago
1
10.53 kB
0

EEG Alpha Wave Recording

Investigating Brain Activity in a Resting-State Experiment

Healthcare
1 year ago
23
64.65 MB
0

Y-combinator Listed Companies

Companies and their information listed on Y-combinator

Finance and Economics
1 year ago
7
823.47 kB
0

Amazon Brands And Exclusives

Dataset from "Amazon Puts Its Own 'Brands' First Above Better-Rated Products"

Finance and Economics
1 year ago
17
451.78 kB
0

Countries By Gross National Income (GNI)

Economic health by nation

Finance and Economics
1 year ago
5
51.12 kB
0

TinySOL: Isolated Musical Notes Audio Dataset

A Balanced Audio Dataset for Music Information Retrieval

Media and Entertainment
1 year ago
1
43.48 kB
0

Estimating Occupancy Levels In Enclosed Spaces

Estimate occupancy based on CO2

Environmental and Climate Sciences
1 year ago
3
566.8 kB
0

Major League Baseball Game Logs

Historical MLB Game Logs and Player Statistics from 1871-2016

Sports
1 year ago
1
21.42 MB
0

Open Subtitles Multilingual Translation

Train Sequential Neural Networks in Nine Languages

Other
1 year ago
5
641.5 MB
0

Blended Skill Talk

Personality, Empathy, and Knowledge

Other
1 year ago
3
62.47 MB
0

LongAlpaca 16K-Length

Investigating Natural Language Processing Performance

Other
1 year ago
1
125.31 MB
0

Large-Scale Preference Dataset

Training Powerful Reward & Critic Models with Aligned Language Models

Other
1 year ago
1
361.39 MB
0
1 year ago
2
191.59 MB
0

Tamazight-NLP/Pontoon-Translations: Source-Target

Tamazight Translation Dataset: Source-Target Sentences for NLP

Other
1 year ago
1
3.47 MB
0

Share link

Anyone who has the link will be able to view this.