Baselight
Sign In

Datasets

Total public datasets added

8,801

Rows

Total rows contributed

5,589,826,419

Popularity

Total times datasets used in queries

307

Stars

Total stars received

37

Amod Mental Health Counseling Conversations

A dataset of mental health counseling conversations for training models

Healthcare
1 year ago
1
2.3 MB
0

Logical Reasoning Improvement Dataset

Enhancing LLM Logical Reasoning Skills with Platypus2 Models

Technology and IT
1 year ago
1
15.33 MB
0

Glaive Function Calling V2

A Knowledge Base for Trainable Natural Language Processing

Other
1 year ago
1
97.04 MB
0

Alpaca

Alpaca - Training LLMs to follow instructions

Other
1 year ago
1
70.75 MB
0

Tulu V2 Dataset

Assisting Assistive Tasks with Language Data Mixtures

Other
1 year ago
1
561.5 MB
0

Know Saraswati COT

Open Source Logical Reasoning Dataset

Other
1 year ago
1
72.17 MB
0

SlimOrca

OpenOrca (Reproduction of Orca) - Cleverly Sampled

Other
1 year ago
1
484.27 MB
0

Autonomous Transport User Experiences

Rating Vehicle and User Interface Performance in Luxembourg Pilots

Transportation and Logistics
1 year ago
2
87.94 kB
0

Airbnb Listings In Boston

Location, Ratings, and Prices

Finance and Economics
1 year ago
1
100.07 kB
0

Fertilizer Use And Price

1960-2012 data on fertilizer consumption in the United States by plant nutrient

Finance and Economics
1 year ago
1
264.11 kB
0

Predicting The Weather In Indonesia

5 Years of Historical Data

Environmental and Climate Sciences
1 year ago
1
104.95 kB
0

TIMDB - Bollywood Films

A Data-Driven Approach to Bollywood

Other
1 year ago
14
4.3 MB
0

Craigslist Gigs (Boston)

Gigs collected from craigslist (boston)

Other
1 year ago
2
127.52 kB
0

Chemistry Problem-Solution

Chemistry Problem-Solution Dataset: 20K pairs across 25 topics and subtopics

Other
1 year ago
1
16.53 MB
0

Openerotica/basilisk-v0.2 Conversations Dataset

Annotated Conversations from openerotica and freedom-rp

Other
1 year ago
1
371.62 MB
0

GPT Roleplay Realm: Enhanced Character

Character Cards and Dialogues for immersive role-playing experiences

Other
1 year ago
2
801.17 MB
0

The Pile Small

A dataset for pretraining general models

Other
1 year ago
1
328.77 MB
0

Mintaka By AmazonScience (Multilingual Q&A)

8 Language Variations with Complex Question Types

Other
1 year ago
3
2.34 MB
0

LongAlpaca-Yukang ML Instructional Outputs

Unlocking the Power of AI

Technology and IT
1 year ago
1
265.85 MB
0

Objaverse-XL: 10M+ 3D Objects, Zero123-XL

For Training AI-Powered 3D Rendering

Technology and IT
1 year ago
1
1.38 GB
0

Synthia-v1.3

Orca-style dataset for following directions and conducting in-depth discussions

Other
1 year ago
1
128.27 MB
0

Air Pollution And Mental Health

Identifying Short-Term Human Impacts of Air Pollution

Healthcare
1 year ago
1
646.57 kB
0

Regional Water Temperatures Over Time

Historical Records of Berlin, Brandenburg and Altmark Lakes

Environmental and Climate Sciences
1 year ago
1
5.29 kB
0

Predicting Portuguese Bank Term Deposit

Identifying Likely Customers for Conversion Optimization

Finance and Economics
1 year ago
2
423.05 kB
0

Smithsonian Butterfly Dataset

Butterfly images and information from the Smithsonian Institution

Other
1 year ago
1
483.38 MB
0

GSM8K - Grade School Math 8K Q&A

A Linguistically Diverse Dataset for Multi-Step Reasoning Question Answering

Demographics and Population Studies
1 year ago
4
5.81 MB
0

MetaMath QA

Mathematical Questions for Large Language Models

Other
1 year ago
1
138.79 MB
0

HelpSteer: AI Alignment Dataset

Real-World Helpfulness Annotated for AI Alignment

Technology and IT
1 year ago
2
30.85 MB
0

Women's Crimes In India

Characteristics, Frequency, and Motives

Demographics and Population Studies
1 year ago
76
5.17 MB
0

Mental Health Chatbot Pairs

AI-based Tailored Support for Mental Health Conversation

Healthcare
1 year ago
1
103.88 kB
0

General Language Understanding Evaluation (GLUE)

The Famous General Language Understanding Evaluation benchmark

Other
1 year ago
34
151.72 MB
0

India Air Quality Trend

Comparing 2 Years of Air Quality Data from 2018 - 2020

Environmental and Climate Sciences
1 year ago
1
959.45 kB
0

Pokemon Gen 9 Stats

Understanding the Impact of Each Stat on Pokemon Performance

Media and Entertainment
1 year ago
1
18.35 kB
0

Job Postings In Europe

Exploring Salaries, Job Types and Locations

Finance and Economics
1 year ago
1
37.08 MB
0

Opera Performances

Opera performances and associated data (Composers, Year written, etc)

Other
1 year ago
1
618.08 kB
0

GoodReads Best Books

Ratings, Genres, Awards, and More

Media and Entertainment
1 year ago
1
42.19 MB
0

Evol-Instruct-Code-80k-v1

Instructional code snippets with corresponding outputs

Other
1 year ago
1
53.72 MB
0

DailyDialog (Multi-turn Dialog)

Dialogues that reflect our daily communication way and cover various topics

Other
1 year ago
3
4.13 MB
0

Online Influencer Marketing

Influencer Engagement and Performance

Ecommerce and Consumer Trends
1 year ago
1
62.88 kB
0

Belgian Statutory Article Retrieval Dataset

Legal Q&A Dataset for Law Information Retrieval

Other
1 year ago
3
4.01 MB
0

ViGGO: Video Game Chatbot Dataset

Conversational data-to-text for video game chatbots

Technology and IT
1 year ago
8
1.2 MB
0

Medical Conversation Corpus (100k+)

Generative Language Modeling for Medical Applications

Healthcare
1 year ago
2
75.7 MB
0

AG News (News Articles)

News Articles Text Classification

Technology and IT
1 year ago
2
19.35 MB
0

Conversations On Coding, Debugging, Storytelling

Conversations on Coding, Debugging, Storytelling & Science

Other
1 year ago
1
2.21 MB
0

ProsocialDialog - Problematic Content Dialogue

Teach conversation agents to respond to problematic topics

Other
1 year ago
3
40.71 MB
0

Comprehensive Medical Q&A Dataset

Unlocking Healthcare Data with Natural Language Processing

Healthcare
1 year ago
1
8.76 MB
0

Chinese Medical Dialogue

Deep Learning for Intelligent Healthcare

Healthcare
1 year ago
6
888.12 MB
0

Housing Prices In San Francisco (Craigslist)

Predicting housing prices based on scraped craigslist data

Finance and Economics
1 year ago
2
223.9 kB
0

Global Suicide, Mental Health, Substance Use

Analyzing the Impact Across Countries

Healthcare
1 year ago
2
90.11 kB
0

Hourly European Power Market Prices

Price Comparisons by System, Type and Currency

Finance and Economics
1 year ago
1
11.65 MB
0

Share link

Anyone who has the link will be able to view this.