Baselight
Sign In
Multi-select dropdown. Use arrow keys to navigate, Enter to select, and Escape to close.
No options selected
Multi-select dropdown. Use arrow keys to navigate, Enter to select, and Escape to close.
1 option selected: Kaggle
Showing 8718 Datasets

Synthia-v1.3

Synthetic training data for LLM development

Technology and IT
11 months ago
1
128.27 MB
0

Kubernetes Commands

kubectl commands and descriptions for Kubernetes

Other
11 months ago
1
3.65 MB
0

Cricket Commentary Dataset

Performance Validation for Cricket Commentary Model

Sports
11 months ago
3
6.95 MB
0

Text Classification For QA Dataset

Text classification dataset for question answering

Technology and IT
11 months ago
3
13.32 MB
0

Accurate Medical Translation Data

Accurate Medical Translation Dataset

Healthcare
11 months ago
1
2.45 MB
0

Textual Entailment Dataset

Textual Entailment Dataset with Labelled Text Pairs

Other
11 months ago
3
51.84 MB
0

WinoBias Coreference Dataset

Gender-biased coreference dataset focused on occupation stereotypes in WinoBias

Demographics and Population Studies
11 months ago
8
271.58 kB
0

WikiANN

Multilingual named entity recognition for LLM training

Technology and IT
11 months ago
528
137.22 MB
0

MLQA - Multilingual Question-Answering

Multilingual Question-Answering Dataset

Other
11 months ago
116
259.57 MB
0

HAREM Portuguese NER Corpus

Portuguese NER Corpus with 10 Classes

Other
11 months ago
3
442.56 kB
0

DBpedia Ontology

Text Classification Dataset with 14 Classes

Technology and IT
11 months ago
2
116 MB
0

Mind2Web: Generalist Agents For Web Tasks

Language-guided Generalist Agents for Web Tasks

Other
11 months ago
1
814.5 MB
0

CAMEL AI: Biology Problems / Solutions

Biology Problem-Solution Pairs for Synthetic Biology

Technology and IT
11 months ago
1
21.86 MB
0

MathInstruct Dataset: Hybrid Math Instruction

A curated dataset for math instruction tuning models

Technology and IT
11 months ago
1
97.66 MB
0

TokenBender: Alpaca Code Generation Instructions

Generating Alpaca-style code from natural language instructions

Other
11 months ago
1
70.75 MB
0

Knowledge Symbolic Correlation With LLMs

Building a Bridge Between Prompts and Knowledge for Large Language Models

Other
11 months ago
1
130.15 kB
0

Self-instruct Starcoder

Instruct dataset generated from starcoder

Other
11 months ago
4
10.83 MB
0

Ultrafeedback Binarized

Predicting Binary Preferences with SFT, PPO and DPO

Other
11 months ago
6
644.14 MB
0

Empathetic Conversational Model Benchmark

Conversation, Prompts, and Tags

Other
11 months ago
3
7.48 MB
0

Facebook Posts Of Amazon Tourism

Analyzing Consumer Engagement and Content Trends

Ecommerce and Consumer Trends
11 months ago
1
225.48 kB
0

Customer Purchasing Patterns With Market Basket

Identifying Key Associations

Finance and Economics
11 months ago
2
242.6 kB
0

Museo Del Prado Artworks

Pre-1489 Techniques, Dimensions and Origins

Other
11 months ago
1
625.25 kB
0

CommonsenseQA (Multiple-Choice Q&A)

12,102 questions with one correct answer and four distractor answers

Other
11 months ago
3
1.19 MB
0

Newsgroups (Text Classification)

Comprehensive Collection of Text Classification Datasets

Technology and IT
11 months ago
77
64.61 MB
0

Quoref (Q&A For Coreference Resolution)

Resolving Coreferences to Answer Questions

Other
11 months ago
2
9.97 MB
0

Rotten Tomatoes Movie Reviews

Predicting Movie Review Sentiment

Ecommerce and Consumer Trends
11 months ago
3
871.27 kB
0

Mobile Phone Carriers By Country

A Dataset of mobile phone carriers

Ecommerce and Consumer Trends
11 months ago
3
17.18 kB
0

SciTail (Multiple-choice Science Exams)

27,026 Multiple-choice science exams and web sentences

Other
11 months ago
12
12.76 MB
0

Marine Institute Buoy Wave Forecast

Significant Wave Heights, Mean Wave Periods, and Wave Power Data

Environmental and Climate Sciences
11 months ago
1
215.36 kB
0

Comparative Analysis Of Airbnb Prices In Barcelona

The Cheapest and Most Expensive Accommodations

Finance and Economics
11 months ago
1
21.98 kB
0

Vibrio Vulnificus Abundance In Ala Wai Canal

Exploring Temporal, Spatial and Environmental Influences

Environmental and Climate Sciences
11 months ago
1
45.46 kB
0

Costs Of Planting Low-Carbon Ecosystems In China

Carbon Sequestration and Investment Potential

Finance and Economics
11 months ago
1
5.64 kB
0

EV Driver Trips In London

Charging Bundle Optimization for EV Adoption

Other
11 months ago
3
319.55 kB
0

Scotland's Health, Housing And Crime Statistics

Exploring Multifaceted Issues with Machine Learning

Healthcare
11 months ago
1
715.14 kB
0

Effects Of On-Farm Hatching On Layer Chicks

Stress, Cognitive Ability, and Weight Gain

Healthcare
11 months ago
31
253.55 kB
0

Geographic Patterns Of NYPD Arrests

Exploring Arrest Locations and Contributing Factors

Other
11 months ago
1
2.54 kB
0

College Football 2022 (Wins, Losses, Rankings)

Team Performance and Game Results

Sports
11 months ago
5
183.12 kB
0

Analysis Of Spanish Apartment Pricing And Size

Investigating the Impact of the Pandemic

Finance and Economics
11 months ago
1
84.29 MB
0

Canadian Baseball

Examining Player Performance by Division, State and School

Sports
11 months ago
13
515.08 kB
0

Drug Indication Data (FAERS)

Drug indications extracted from the FDA Adverse Event Reporting System (FAERS)

Healthcare
11 months ago
1
119.78 kB
0

Altria Financial Ratios Overview

Analyzing Long-Term Stock Performance, Pricing Power, and Capital Allocation

Finance and Economics
11 months ago
1
10.53 kB
0

EEG Alpha Wave Recording

Investigating Brain Activity in a Resting-State Experiment

Healthcare
11 months ago
23
64.65 MB
0

Y-combinator Listed Companies

Companies and their information listed on Y-combinator

Finance and Economics
11 months ago
7
823.47 kB
0

Amazon Brands And Exclusives

Dataset from "Amazon Puts Its Own 'Brands' First Above Better-Rated Products"

Finance and Economics
11 months ago
17
451.78 kB
0

Countries By Gross National Income (GNI)

Economic health by nation

Finance and Economics
11 months ago
5
51.12 kB
0

TinySOL: Isolated Musical Notes Audio Dataset

A Balanced Audio Dataset for Music Information Retrieval

Media and Entertainment
11 months ago
1
43.48 kB
0

Estimating Occupancy Levels In Enclosed Spaces

Estimate occupancy based on CO2

Environmental and Climate Sciences
11 months ago
3
566.8 kB
0

Major League Baseball Game Logs

Historical MLB Game Logs and Player Statistics from 1871-2016

Sports
11 months ago
1
21.42 MB
0

Open Subtitles Multilingual Translation

Train Sequential Neural Networks in Nine Languages

Other
11 months ago
5
641.5 MB
0

Blended Skill Talk

Personality, Empathy, and Knowledge

Other
11 months ago
3
62.47 MB
0

Share link

Anyone who has the link will be able to view this.