Baselight

Data Catalog

Explore, analyze, and share quality data.

Multi-select dropdown. Use arrow keys to navigate, Enter to select, and Escape to close.
No options selected
Multi-select dropdown. Use arrow keys to navigate, Enter to select, and Escape to close.
1 option selected: Kaggle
Showing 8660 Datasets

Erotiquant-XL

Enhanced erotica dataset with longer context samples

Other
8 months ago
1
99.99 MB
0

Synthia-v1.3

Synthetic training data for LLM development

Technology and IT
8 months ago
1
128.27 MB
0

Kubernetes Commands

kubectl commands and descriptions for Kubernetes

Other
8 months ago
1
3.65 MB
0

Cricket Commentary Dataset

Performance Validation for Cricket Commentary Model

Sports
8 months ago
3
6.95 MB
0

Text Classification For QA Dataset

Text classification dataset for question answering

Technology and IT
8 months ago
3
13.32 MB
0

Accurate Medical Translation Data

Accurate Medical Translation Dataset

Healthcare
8 months ago
1
2.45 MB
0

Textual Entailment Dataset

Textual Entailment Dataset with Labelled Text Pairs

Other
8 months ago
3
51.84 MB
0

WinoBias Coreference Dataset

Gender-biased coreference dataset focused on occupation stereotypes in WinoBias

Demographics and Population Studies
8 months ago
8
271.58 kB
0

WikiANN

Multilingual named entity recognition for LLM training

Technology and IT
8 months ago
528
137.22 MB
0

MLQA - Multilingual Question-Answering

Multilingual Question-Answering Dataset

Other
8 months ago
116
259.57 MB
0

HAREM Portuguese NER Corpus

Portuguese NER Corpus with 10 Classes

Other
8 months ago
3
442.56 kB
0

DBpedia Ontology

Text Classification Dataset with 14 Classes

Technology and IT
8 months ago
2
116 MB
0

Mind2Web: Generalist Agents For Web Tasks

Language-guided Generalist Agents for Web Tasks

Other
8 months ago
1
814.5 MB
0

CAMEL AI: Biology Problems / Solutions

Biology Problem-Solution Pairs for Synthetic Biology

Technology and IT
8 months ago
1
21.86 MB
0

MathInstruct Dataset: Hybrid Math Instruction

A curated dataset for math instruction tuning models

Technology and IT
8 months ago
1
97.66 MB
0

TokenBender: Alpaca Code Generation Instructions

Generating Alpaca-style code from natural language instructions

Other
8 months ago
1
70.75 MB
0

Knowledge Symbolic Correlation With LLMs

Building a Bridge Between Prompts and Knowledge for Large Language Models

Other
8 months ago
1
130.15 kB
0

Self-instruct Starcoder

Instruct dataset generated from starcoder

Other
8 months ago
4
10.83 MB
0

Ultrafeedback Binarized

Predicting Binary Preferences with SFT, PPO and DPO

Other
8 months ago
6
644.14 MB
0

Empathetic Conversational Model Benchmark

Conversation, Prompts, and Tags

Other
8 months ago
3
7.48 MB
0

Facebook Posts Of Amazon Tourism

Analyzing Consumer Engagement and Content Trends

Ecommerce and Consumer Trends
8 months ago
1
225.48 kB
0

Customer Purchasing Patterns With Market Basket

Identifying Key Associations

Finance and Economics
8 months ago
2
242.6 kB
0

Museo Del Prado Artworks

Pre-1489 Techniques, Dimensions and Origins

Other
8 months ago
1
625.25 kB
0

CommonsenseQA (Multiple-Choice Q&A)

12,102 questions with one correct answer and four distractor answers

Other
8 months ago
3
1.19 MB
0

Share link

Anyone who has the link will be able to view this.