Baselight

Data Catalog

Explore, analyze, and share quality data.

Multi-select dropdown. Use arrow keys to navigate, Enter to select, and Escape to close.
No options selected
Multi-select dropdown. Use arrow keys to navigate, Enter to select, and Escape to close.
No options selected
Showing 58957 Datasets

OpenHermes

GPT-4 AI Dataset - 242K Entries

Technology and IT
10 months ago
1
141.81 MB
0

QSAR Molecular Descriptor Predictions

Analyzing Activation Energy in Chemical Compounds

Environmental and Climate Sciences
10 months ago
3
260.56 kB
0

PAWS (Paraphrase Word Scrambling)

A dataset for modeling structure, context, and word order information

Other
10 months ago
6
124.19 MB
0

TinyShakespeare (Shakespeare's Plays)

40,000 lines of Shakespeare from a variety of Shakespeare's plays

Other
10 months ago
2
75.81 kB
0

Reddit: /r/EatCheapAndHealthy

Cost-Effective Nutritional Solutions from the Community

Finance and Economics
10 months ago
1
526.31 kB
0

Lovoo V3 Dating App User Profiles And Statistics

Revealing popular user traits and behavior

Media and Entertainment
10 months ago
3
846.63 kB
0

Crypto, Web3 And Blockchain Jobs

Scraped active crypto jobs listed on cryptojobslist.com

Crypto and Blockchain
10 months ago
110
1.11 MB
0

NFT Top Collections (Timeseries)

Historical data of the top NFT collections

Crypto and Blockchain
10 months ago
2
201.03 kB
0

220k-GPT4Vision Image Captions

220k-GPT4Vision Image Captions

Other
10 months ago
1
44.1 MB
0

RSICD Image Caption Dataset

RSICD Image Caption Dataset

Other
10 months ago
3
1.04 GB
0

Psychedelic Drug Database

Psychotropic and psychedelics drugs database with molecular descriptors

Healthcare
10 months ago
1
237.57 kB
0

Amod Mental Health Counseling Conversations

A dataset of mental health counseling conversations for training models

Healthcare
10 months ago
1
2.3 MB
0

Logical Reasoning Improvement Dataset

Enhancing LLM Logical Reasoning Skills with Platypus2 Models

Technology and IT
10 months ago
1
15.33 MB
0

Glaive Function Calling V2

A Knowledge Base for Trainable Natural Language Processing

Other
10 months ago
1
97.04 MB
0

Alpaca

Alpaca - Training LLMs to follow instructions

Other
10 months ago
1
70.75 MB
0

Tulu V2 Dataset

Assisting Assistive Tasks with Language Data Mixtures

Other
10 months ago
1
561.5 MB
0

Know Saraswati COT

Open Source Logical Reasoning Dataset

Other
10 months ago
1
72.17 MB
0

SlimOrca

OpenOrca (Reproduction of Orca) - Cleverly Sampled

Other
10 months ago
1
484.27 MB
0

Autonomous Transport User Experiences

Rating Vehicle and User Interface Performance in Luxembourg Pilots

Transportation and Logistics
10 months ago
2
87.94 kB
0

Airbnb Listings In Boston

Location, Ratings, and Prices

Finance and Economics
10 months ago
1
100.07 kB
0

Fertilizer Use And Price

1960-2012 data on fertilizer consumption in the United States by plant nutrient

Finance and Economics
10 months ago
1
264.11 kB
0

Predicting The Weather In Indonesia

5 Years of Historical Data

Environmental and Climate Sciences
10 months ago
1
104.95 kB
0

TIMDB - Bollywood Films

A Data-Driven Approach to Bollywood

Other
10 months ago
14
4.3 MB
0

Craigslist Gigs (Boston)

Gigs collected from craigslist (boston)

Other
10 months ago
2
127.52 kB
0

Chemistry Problem-Solution

Chemistry Problem-Solution Dataset: 20K pairs across 25 topics and subtopics

Other
10 months ago
1
16.53 MB
0

Openerotica/basilisk-v0.2 Conversations Dataset

Annotated Conversations from openerotica and freedom-rp

Other
10 months ago
1
371.62 MB
0

GPT Roleplay Realm: Enhanced Character

Character Cards and Dialogues for immersive role-playing experiences

Other
10 months ago
2
801.17 MB
0

The Pile Small

A dataset for pretraining general models

Other
10 months ago
1
328.77 MB
0

Mintaka By AmazonScience (Multilingual Q&A)

8 Language Variations with Complex Question Types

Other
10 months ago
3
2.34 MB
0

LongAlpaca-Yukang ML Instructional Outputs

Unlocking the Power of AI

Technology and IT
10 months ago
1
265.85 MB
0

Objaverse-XL: 10M+ 3D Objects, Zero123-XL

For Training AI-Powered 3D Rendering

Technology and IT
10 months ago
1
1.38 GB
0

Synthia-v1.3

Orca-style dataset for following directions and conducting in-depth discussions

Other
10 months ago
1
128.27 MB
0

Air Pollution And Mental Health

Identifying Short-Term Human Impacts of Air Pollution

Healthcare
10 months ago
1
646.57 kB
0

Regional Water Temperatures Over Time

Historical Records of Berlin, Brandenburg and Altmark Lakes

Environmental and Climate Sciences
10 months ago
1
5.29 kB
0

Predicting Portuguese Bank Term Deposit

Identifying Likely Customers for Conversion Optimization

Finance and Economics
10 months ago
2
423.05 kB
0

Smithsonian Butterfly Dataset

Butterfly images and information from the Smithsonian Institution

Other
10 months ago
1
483.38 MB
0

GSM8K - Grade School Math 8K Q&A

A Linguistically Diverse Dataset for Multi-Step Reasoning Question Answering

Demographics and Population Studies
10 months ago
4
5.81 MB
0

MetaMath QA

Mathematical Questions for Large Language Models

Other
10 months ago
1
138.79 MB
0

HelpSteer: AI Alignment Dataset

Real-World Helpfulness Annotated for AI Alignment

Technology and IT
10 months ago
2
30.85 MB
0

Women's Crimes In India

Characteristics, Frequency, and Motives

Demographics and Population Studies
10 months ago
76
5.17 MB
0

Mental Health Chatbot Pairs

AI-based Tailored Support for Mental Health Conversation

Healthcare
10 months ago
1
103.88 kB
0

General Language Understanding Evaluation (GLUE)

The Famous General Language Understanding Evaluation benchmark

Other
10 months ago
34
151.72 MB
0

India Air Quality Trend

Comparing 2 Years of Air Quality Data from 2018 - 2020

Environmental and Climate Sciences
10 months ago
1
959.45 kB
0

Pokemon Gen 9 Stats

Understanding the Impact of Each Stat on Pokemon Performance

Media and Entertainment
10 months ago
1
18.35 kB
0

Job Postings In Europe

Exploring Salaries, Job Types and Locations

Finance and Economics
10 months ago
1
37.08 MB
0

Opera Performances

Opera performances and associated data (Composers, Year written, etc)

Other
10 months ago
1
618.08 kB
0

GoodReads Best Books

Ratings, Genres, Awards, and More

Media and Entertainment
10 months ago
1
42.19 MB
0

Evol-Instruct-Code-80k-v1

Instructional code snippets with corresponding outputs

Other
10 months ago
1
53.72 MB
0

DailyDialog (Multi-turn Dialog)

Dialogues that reflect our daily communication way and cover various topics

Other
10 months ago
3
4.13 MB
0

Online Influencer Marketing

Influencer Engagement and Performance

Ecommerce and Consumer Trends
10 months ago
1
62.88 kB
0

Share link

Anyone who has the link will be able to view this.