Baselight
Sign In

Datasets

Total public datasets added

8,718

Rows

Total rows contributed

5,557,227,310

Popularity

Total times datasets used in queries

248

Stars

Total stars received

17

Wikipedia Molecules Properties Dataset

Molecular Properties Dataset from Wikipedia

Other
11 months ago
1
1.83 MB
0

LAMBADA Word Prediction

Evaluating text understanding through word prediction

Other
11 months ago
3
552.45 MB
0

Question-Answering Training And Testing Data

A dataset for training and testing question-answering models

Other
11 months ago
2
83.38 MB
0

LLM Feedback Collection

Induce fine-grained evaluation capabilities into language models

Technology and IT
11 months ago
1
459.52 MB
0

UltraChat 200K

200K Dialogues of Diverse Topics for NLG Research

Academic Research
11 months ago
4
1.63 GB
0

Orca DPO Dialogue Pairs

Orca style for preference training (Intel's DPO dataset)

Other
11 months ago
1
18.88 MB
0

OpenHermes

GPT-4 AI Dataset - 242K Entries

Technology and IT
11 months ago
1
141.81 MB
0

QSAR Molecular Descriptor Predictions

Analyzing Activation Energy in Chemical Compounds

Environmental and Climate Sciences
11 months ago
3
260.56 kB
0

PAWS (Paraphrase Word Scrambling)

A dataset for modeling structure, context, and word order information

Other
11 months ago
6
124.19 MB
0

TinyShakespeare (Shakespeare's Plays)

40,000 lines of Shakespeare from a variety of Shakespeare's plays

Other
11 months ago
2
75.81 kB
0

Reddit: /r/EatCheapAndHealthy

Cost-Effective Nutritional Solutions from the Community

Finance and Economics
11 months ago
1
526.31 kB
0

Lovoo V3 Dating App User Profiles And Statistics

Revealing popular user traits and behavior

Media and Entertainment
11 months ago
3
846.63 kB
0

Crypto, Web3 And Blockchain Jobs

Scraped active crypto jobs listed on cryptojobslist.com

Crypto and Blockchain
11 months ago
110
1.11 MB
0

NFT Top Collections (Timeseries)

Historical data of the top NFT collections

Crypto and Blockchain
11 months ago
2
201.03 kB
0

220k-GPT4Vision Image Captions

220k-GPT4Vision Image Captions

Other
11 months ago
1
44.1 MB
0

RSICD Image Caption Dataset

RSICD Image Caption Dataset

Other
11 months ago
3
1.04 GB
0

Psychedelic Drug Database

Psychotropic and psychedelics drugs database with molecular descriptors

Healthcare
11 months ago
1
237.57 kB
0

Amod Mental Health Counseling Conversations

A dataset of mental health counseling conversations for training models

Healthcare
11 months ago
1
2.3 MB
0

Logical Reasoning Improvement Dataset

Enhancing LLM Logical Reasoning Skills with Platypus2 Models

Technology and IT
11 months ago
1
15.33 MB
0

Glaive Function Calling V2

A Knowledge Base for Trainable Natural Language Processing

Other
11 months ago
1
97.04 MB
0

Alpaca

Alpaca - Training LLMs to follow instructions

Other
11 months ago
1
70.75 MB
0

Tulu V2 Dataset

Assisting Assistive Tasks with Language Data Mixtures

Other
11 months ago
1
561.5 MB
0

Know Saraswati COT

Open Source Logical Reasoning Dataset

Other
11 months ago
1
72.17 MB
0

SlimOrca

OpenOrca (Reproduction of Orca) - Cleverly Sampled

Other
11 months ago
1
484.27 MB
0

Autonomous Transport User Experiences

Rating Vehicle and User Interface Performance in Luxembourg Pilots

Transportation and Logistics
11 months ago
2
87.94 kB
0

Airbnb Listings In Boston

Location, Ratings, and Prices

Finance and Economics
11 months ago
1
100.07 kB
0

Fertilizer Use And Price

1960-2012 data on fertilizer consumption in the United States by plant nutrient

Finance and Economics
11 months ago
1
264.11 kB
0

Predicting The Weather In Indonesia

5 Years of Historical Data

Environmental and Climate Sciences
11 months ago
1
104.95 kB
0

TIMDB - Bollywood Films

A Data-Driven Approach to Bollywood

Other
11 months ago
14
4.3 MB
0

Craigslist Gigs (Boston)

Gigs collected from craigslist (boston)

Other
11 months ago
2
127.52 kB
0

Chemistry Problem-Solution

Chemistry Problem-Solution Dataset: 20K pairs across 25 topics and subtopics

Other
11 months ago
1
16.53 MB
0

Openerotica/basilisk-v0.2 Conversations Dataset

Annotated Conversations from openerotica and freedom-rp

Other
11 months ago
1
371.62 MB
0

GPT Roleplay Realm: Enhanced Character

Character Cards and Dialogues for immersive role-playing experiences

Other
11 months ago
2
801.17 MB
0

The Pile Small

A dataset for pretraining general models

Other
11 months ago
1
328.77 MB
0

Mintaka By AmazonScience (Multilingual Q&A)

8 Language Variations with Complex Question Types

Other
11 months ago
3
2.34 MB
0

LongAlpaca-Yukang ML Instructional Outputs

Unlocking the Power of AI

Technology and IT
11 months ago
1
265.85 MB
0

Objaverse-XL: 10M+ 3D Objects, Zero123-XL

For Training AI-Powered 3D Rendering

Technology and IT
11 months ago
1
1.38 GB
0

Synthia-v1.3

Orca-style dataset for following directions and conducting in-depth discussions

Other
11 months ago
1
128.27 MB
0

Air Pollution And Mental Health

Identifying Short-Term Human Impacts of Air Pollution

Healthcare
11 months ago
1
646.57 kB
0

Regional Water Temperatures Over Time

Historical Records of Berlin, Brandenburg and Altmark Lakes

Environmental and Climate Sciences
11 months ago
1
5.29 kB
0

Predicting Portuguese Bank Term Deposit

Identifying Likely Customers for Conversion Optimization

Finance and Economics
11 months ago
2
423.05 kB
0

Smithsonian Butterfly Dataset

Butterfly images and information from the Smithsonian Institution

Other
11 months ago
1
483.38 MB
0

GSM8K - Grade School Math 8K Q&A

A Linguistically Diverse Dataset for Multi-Step Reasoning Question Answering

Demographics and Population Studies
11 months ago
4
5.81 MB
0

MetaMath QA

Mathematical Questions for Large Language Models

Other
11 months ago
1
138.79 MB
0

HelpSteer: AI Alignment Dataset

Real-World Helpfulness Annotated for AI Alignment

Technology and IT
11 months ago
2
30.85 MB
0

Women's Crimes In India

Characteristics, Frequency, and Motives

Demographics and Population Studies
11 months ago
76
5.17 MB
0

Mental Health Chatbot Pairs

AI-based Tailored Support for Mental Health Conversation

Healthcare
11 months ago
1
103.88 kB
0

General Language Understanding Evaluation (GLUE)

The Famous General Language Understanding Evaluation benchmark

Other
11 months ago
34
151.72 MB
0

India Air Quality Trend

Comparing 2 Years of Air Quality Data from 2018 - 2020

Environmental and Climate Sciences
11 months ago
1
959.45 kB
0

Pokemon Gen 9 Stats

Understanding the Impact of Each Stat on Pokemon Performance

Media and Entertainment
11 months ago
1
18.35 kB
0
Load More

Share link

Anyone who has the link will be able to view this.