Baselight
Sign In

Datasets

Total public datasets added

8,801

Rows

Total rows contributed

5,589,826,419

Popularity

Total times datasets used in queries

307

Stars

Total stars received

37

Italian Negation Constructions - Tweets

Exploring Language Variation Across 10 Cities

Ecommerce and Consumer Trends
1 year ago
1
21.13 kB
0

Remote Jobs In Spain

Analyzing Roles, Technologies, and Salaries in October 2020

Finance and Economics
1 year ago
1
1.31 MB
0

Reddit: /r/worldnews (Submissions & Comments)

Analyzing Post Engagement

Ecommerce and Consumer Trends
1 year ago
1
268.97 kB
0

Industrial Energy End Use In The U.S

Facility-Level Combustion Energy Data

Environmental and Climate Sciences
1 year ago
2
1.16 MB
0

Landmark Detection For Tsetse Fly

Accurate Morphometric Data

Other
1 year ago
1
5.25 MB
0

NewChic Product Catalog (Customer Segmentation)

Trends, User Tastes, Brand Segmentation, and Online Shopping Opportunities

Finance and Economics
1 year ago
9
20.63 MB
0

US Tennis Courts: Capacity, Amenities, And

Discovering Court Types, Amenities, and Locations Across the US

Sports
1 year ago
1
1.28 MB
0

Airbnb Listings And Reviews In Washington, DC

Exploring Room Availability, Host Profiles, and Pricing Data

Finance and Economics
1 year ago
2
739.8 kB
0

London's Airbnb

Airbnb listings in London

Ecommerce and Consumer Trends
1 year ago
3
13.61 MB
0

Compounds For Studying Environmental Exposures

PubChemLite: Annotation Categories for Translational and Applied Research

Academic Research
1 year ago
1
73.82 MB
0

COCONUT: The COlleCtion Of Open NatUral ProducTs.

Unlocking Molecule Information

Finance and Economics
1 year ago
1
73.58 MB
0

Comprehensive Literary Greats Dataset

50,000+ Books Rated and Awarded Across Language, Genre, and Format

Media and Entertainment
1 year ago
1
42.19 MB
0

SONYC-UST Audio Tag Dataset

Annotated Real-World Urban Sounds for Multi-Label Audio Tag Prediction

Transportation and Logistics
1 year ago
1
822.85 kB
0

Game Boy Advance Games

Games Released for Game boy advanced

Other
1 year ago
3
91.05 kB
0

Game Boy Games

Games released for game boy

Other
1 year ago
3
62.08 kB
0

Google's M&A History

How Much, What For, and Where?

Other
1 year ago
12
138.39 kB
0

Submachine Guns

A dataset of known submachine gun models

Other
1 year ago
1
14.32 kB
0

Galaxy Clustering

Iris, Moon, and Circles datasets for Galaxy clustering tutorial

Other
1 year ago
3
20.43 kB
0

Lake Baikal Biomass (Decal Change)

Investigating Climate Change-Driven Regime Shifts

Environmental and Climate Sciences
1 year ago
2
72.61 kB
0

Insects Flight Dynamics

Drosophila melanogaster, Isoleucinella rotunda, and Calopteron reticulatum

Transportation and Logistics
1 year ago
50
1.25 GB
0

Intel Processors

A Comprehensive Guide

Other
1 year ago
24
184.22 kB
0

The World's Highest Mountains

A Dataset of Peaks with at Least 500m Prominence

Other
1 year ago
3
15.28 kB
0

QASPER: NLP Questions And Evidence

Discovering Answers with Expertise

Other
1 year ago
3
27.9 MB
0

MultiNLI Textual Entailment Corpus

Multi-Genre Natural Language Inference (MultiNLI)

Other
1 year ago
3
215.15 MB
0

Cmrc2018 - Chinese Machine Reading Comprehension

Chinese MRC Dataset with Language Diversities

Other
1 year ago
3
5.48 MB
0

English-Darija Bilingual Text (Moroccan Arabic)

English-Darija Bilingual Corpus for Machine Translation

Other
1 year ago
1
23.28 MB
0

Erotiquant-XL

Enhanced erotica dataset with longer context samples

Other
1 year ago
1
99.99 MB
0

Synthia-v1.3

Synthetic training data for LLM development

Technology and IT
1 year ago
1
128.27 MB
0

Kubernetes Commands

kubectl commands and descriptions for Kubernetes

Other
1 year ago
1
3.65 MB
0

Cricket Commentary Dataset

Performance Validation for Cricket Commentary Model

Sports
1 year ago
3
6.95 MB
0

Text Classification For QA Dataset

Text classification dataset for question answering

Technology and IT
1 year ago
3
13.32 MB
0

Accurate Medical Translation Data

Accurate Medical Translation Dataset

Healthcare
1 year ago
1
2.45 MB
0

Textual Entailment Dataset

Textual Entailment Dataset with Labelled Text Pairs

Other
1 year ago
3
51.84 MB
0

WinoBias Coreference Dataset

Gender-biased coreference dataset focused on occupation stereotypes in WinoBias

Demographics and Population Studies
1 year ago
8
271.58 kB
0

WikiANN

Multilingual named entity recognition for LLM training

Technology and IT
1 year ago
528
137.22 MB
0

MLQA - Multilingual Question-Answering

Multilingual Question-Answering Dataset

Other
1 year ago
116
259.57 MB
0

HAREM Portuguese NER Corpus

Portuguese NER Corpus with 10 Classes

Other
1 year ago
3
442.56 kB
0

DBpedia Ontology

Text Classification Dataset with 14 Classes

Technology and IT
1 year ago
2
116 MB
0

Mind2Web: Generalist Agents For Web Tasks

Language-guided Generalist Agents for Web Tasks

Other
1 year ago
1
814.5 MB
0

CAMEL AI: Biology Problems / Solutions

Biology Problem-Solution Pairs for Synthetic Biology

Technology and IT
1 year ago
1
21.86 MB
0

MathInstruct Dataset: Hybrid Math Instruction

A curated dataset for math instruction tuning models

Technology and IT
1 year ago
1
97.66 MB
0

TokenBender: Alpaca Code Generation Instructions

Generating Alpaca-style code from natural language instructions

Other
1 year ago
1
70.75 MB
0

Knowledge Symbolic Correlation With LLMs

Building a Bridge Between Prompts and Knowledge for Large Language Models

Other
1 year ago
1
130.15 kB
0

Self-instruct Starcoder

Instruct dataset generated from starcoder

Other
1 year ago
4
10.83 MB
0

Ultrafeedback Binarized

Predicting Binary Preferences with SFT, PPO and DPO

Other
1 year ago
6
644.14 MB
0

Empathetic Conversational Model Benchmark

Conversation, Prompts, and Tags

Other
1 year ago
3
7.48 MB
0

Facebook Posts Of Amazon Tourism

Analyzing Consumer Engagement and Content Trends

Ecommerce and Consumer Trends
1 year ago
1
225.48 kB
0

Customer Purchasing Patterns With Market Basket

Identifying Key Associations

Finance and Economics
1 year ago
2
242.6 kB
0

Museo Del Prado Artworks

Pre-1489 Techniques, Dimensions and Origins

Other
1 year ago
1
625.25 kB
0

CommonsenseQA (Multiple-Choice Q&A)

12,102 questions with one correct answer and four distractor answers

Other
1 year ago
3
1.19 MB
0

Share link

Anyone who has the link will be able to view this.