Baselight

Data Catalog

Explore, analyze, and share quality data.

Multi-select dropdown. Use arrow keys to navigate, Enter to select, and Escape to close.
No options selected
Multi-select dropdown. Use arrow keys to navigate, Enter to select, and Escape to close.
No options selected
Showing 58957 Datasets

CAMEL AI: Biology Problems / Solutions

Biology Problem-Solution Pairs for Synthetic Biology

Technology and IT
10 months ago
1
21.86 MB
0

MathInstruct Dataset: Hybrid Math Instruction

A curated dataset for math instruction tuning models

Technology and IT
10 months ago
1
97.66 MB
0

TokenBender: Alpaca Code Generation Instructions

Generating Alpaca-style code from natural language instructions

Other
10 months ago
1
70.75 MB
0

Knowledge Symbolic Correlation With LLMs

Building a Bridge Between Prompts and Knowledge for Large Language Models

Other
10 months ago
1
130.15 kB
0

Self-instruct Starcoder

Instruct dataset generated from starcoder

Other
10 months ago
4
10.83 MB
0

Ultrafeedback Binarized

Predicting Binary Preferences with SFT, PPO and DPO

Other
10 months ago
6
644.14 MB
0

Empathetic Conversational Model Benchmark

Conversation, Prompts, and Tags

Other
10 months ago
3
7.48 MB
0

Facebook Posts Of Amazon Tourism

Analyzing Consumer Engagement and Content Trends

Ecommerce and Consumer Trends
10 months ago
1
225.48 kB
0

Customer Purchasing Patterns With Market Basket

Identifying Key Associations

Finance and Economics
10 months ago
2
242.6 kB
0

Museo Del Prado Artworks

Pre-1489 Techniques, Dimensions and Origins

Other
10 months ago
1
625.25 kB
0

CommonsenseQA (Multiple-Choice Q&A)

12,102 questions with one correct answer and four distractor answers

Other
10 months ago
3
1.19 MB
0

Newsgroups (Text Classification)

Comprehensive Collection of Text Classification Datasets

Technology and IT
10 months ago
77
64.61 MB
0

Quoref (Q&A For Coreference Resolution)

Resolving Coreferences to Answer Questions

Other
10 months ago
2
9.97 MB
0

Rotten Tomatoes Movie Reviews

Predicting Movie Review Sentiment

Ecommerce and Consumer Trends
10 months ago
3
871.27 kB
0

Mobile Phone Carriers By Country

A Dataset of mobile phone carriers

Ecommerce and Consumer Trends
10 months ago
3
17.18 kB
0

SciTail (Multiple-choice Science Exams)

27,026 Multiple-choice science exams and web sentences

Other
10 months ago
12
12.76 MB
0

Marine Institute Buoy Wave Forecast

Significant Wave Heights, Mean Wave Periods, and Wave Power Data

Environmental and Climate Sciences
10 months ago
1
215.36 kB
0

Comparative Analysis Of Airbnb Prices In Barcelona

The Cheapest and Most Expensive Accommodations

Finance and Economics
10 months ago
1
21.98 kB
0

Vibrio Vulnificus Abundance In Ala Wai Canal

Exploring Temporal, Spatial and Environmental Influences

Environmental and Climate Sciences
10 months ago
1
45.46 kB
0

Costs Of Planting Low-Carbon Ecosystems In China

Carbon Sequestration and Investment Potential

Finance and Economics
10 months ago
1
5.64 kB
0

EV Driver Trips In London

Charging Bundle Optimization for EV Adoption

Other
10 months ago
3
319.55 kB
0

Scotland's Health, Housing And Crime Statistics

Exploring Multifaceted Issues with Machine Learning

Healthcare
10 months ago
1
715.14 kB
0

Effects Of On-Farm Hatching On Layer Chicks

Stress, Cognitive Ability, and Weight Gain

Healthcare
10 months ago
31
253.55 kB
0

Geographic Patterns Of NYPD Arrests

Exploring Arrest Locations and Contributing Factors

Other
10 months ago
1
2.54 kB
0

College Football 2022 (Wins, Losses, Rankings)

Team Performance and Game Results

Sports
10 months ago
5
183.12 kB
0

Analysis Of Spanish Apartment Pricing And Size

Investigating the Impact of the Pandemic

Finance and Economics
10 months ago
1
84.29 MB
0

Canadian Baseball

Examining Player Performance by Division, State and School

Sports
10 months ago
13
515.08 kB
0

Drug Indication Data (FAERS)

Drug indications extracted from the FDA Adverse Event Reporting System (FAERS)

Healthcare
10 months ago
1
119.78 kB
0

Altria Financial Ratios Overview

Analyzing Long-Term Stock Performance, Pricing Power, and Capital Allocation

Finance and Economics
10 months ago
1
10.53 kB
0

EEG Alpha Wave Recording

Investigating Brain Activity in a Resting-State Experiment

Healthcare
10 months ago
23
64.65 MB
0

Y-combinator Listed Companies

Companies and their information listed on Y-combinator

Finance and Economics
10 months ago
7
823.47 kB
0

Amazon Brands And Exclusives

Dataset from "Amazon Puts Its Own 'Brands' First Above Better-Rated Products"

Finance and Economics
10 months ago
17
451.78 kB
0

Countries By Gross National Income (GNI)

Economic health by nation

Finance and Economics
10 months ago
5
51.12 kB
0

TinySOL: Isolated Musical Notes Audio Dataset

A Balanced Audio Dataset for Music Information Retrieval

Media and Entertainment
10 months ago
1
43.48 kB
0

Estimating Occupancy Levels In Enclosed Spaces

Estimate occupancy based on CO2

Environmental and Climate Sciences
10 months ago
3
566.8 kB
0

Major League Baseball Game Logs

Historical MLB Game Logs and Player Statistics from 1871-2016

Sports
10 months ago
1
21.42 MB
0

Open Subtitles Multilingual Translation

Train Sequential Neural Networks in Nine Languages

Other
10 months ago
5
641.5 MB
0

Blended Skill Talk

Personality, Empathy, and Knowledge

Other
10 months ago
3
62.47 MB
0

LongAlpaca 16K-Length

Investigating Natural Language Processing Performance

Other
10 months ago
1
125.31 MB
0

Large-Scale Preference Dataset

Training Powerful Reward & Critic Models with Aligned Language Models

Other
10 months ago
1
361.39 MB
0
10 months ago
2
191.59 MB
0

Tamazight-NLP/Pontoon-Translations: Source-Target

Tamazight Translation Dataset: Source-Target Sentences for NLP

Other
10 months ago
1
3.47 MB
0

Yahoo Answers Topics Dataset

Yahoo Answers Topics Dataset: Questions and Answers for Various Topics

Other
10 months ago
2
525.17 MB
0

Friends TV Show Dialog Sequences

Friends TV Show Dialog Sequences

Media and Entertainment
10 months ago
1
638.07 kB
0

JFLEG: English Grammatical Error Benchmark

English Grammatical Error Correction Dataset

Other
10 months ago
2
290.36 kB
0

ARC: Grade School Science Questions

A Challenge for Advanced Question-Answering Research

Academic Research
10 months ago
6
1.25 MB
0

Nepali Health Q&A Corpus

Investigating Cultural Influences

Healthcare
10 months ago
1
7.69 MB
0

SQL Create Context

Uncovering Implications and Insights

Other
10 months ago
1
6.39 MB
0

OpenAI Summarization Corpus

Training and Validation Data from TL;DR, CNN, and Daily Mail

Other
10 months ago
4
68.93 MB
0

Anthropic Helpfulness-Harmlessness Preference

Iterative Human-in-the-Loop Solutions

Other
10 months ago
2
181.66 MB
0

Share link

Anyone who has the link will be able to view this.