Baselight
Sign In
Multi-select dropdown. Use arrow keys to navigate, Enter to select, and Escape to close.
1 option selected: Other
Multi-select dropdown. Use arrow keys to navigate, Enter to select, and Escape to close.
No options selected
1295 results

Glaive Function Calling V2

A Knowledge Base for Trainable Natural Language Processing

Other
11 months ago
1
97.04 MB
0

Alpaca

Alpaca - Training LLMs to follow instructions

Other
11 months ago
1
70.75 MB
0

Tulu V2 Dataset

Assisting Assistive Tasks with Language Data Mixtures

Other
11 months ago
1
561.5 MB
0

Know Saraswati COT

Open Source Logical Reasoning Dataset

Other
11 months ago
1
72.17 MB
0

SlimOrca

OpenOrca (Reproduction of Orca) - Cleverly Sampled

Other
11 months ago
1
484.27 MB
0

TIMDB - Bollywood Films

A Data-Driven Approach to Bollywood

Other
11 months ago
14
4.3 MB
0

Craigslist Gigs (Boston)

Gigs collected from craigslist (boston)

Other
11 months ago
2
127.52 kB
0

Chemistry Problem-Solution

Chemistry Problem-Solution Dataset: 20K pairs across 25 topics and subtopics

Other
11 months ago
1
16.53 MB
0

Openerotica/basilisk-v0.2 Conversations Dataset

Annotated Conversations from openerotica and freedom-rp

Other
11 months ago
1
371.62 MB
0

GPT Roleplay Realm: Enhanced Character

Character Cards and Dialogues for immersive role-playing experiences

Other
11 months ago
2
801.17 MB
0

The Pile Small

A dataset for pretraining general models

Other
11 months ago
1
328.77 MB
0

Mintaka By AmazonScience (Multilingual Q&A)

8 Language Variations with Complex Question Types

Other
11 months ago
3
2.34 MB
0

Synthia-v1.3

Orca-style dataset for following directions and conducting in-depth discussions

Other
11 months ago
1
128.27 MB
0

Smithsonian Butterfly Dataset

Butterfly images and information from the Smithsonian Institution

Other
11 months ago
1
483.38 MB
0

MetaMath QA

Mathematical Questions for Large Language Models

Other
11 months ago
1
138.79 MB
0

General Language Understanding Evaluation (GLUE)

The Famous General Language Understanding Evaluation benchmark

Other
11 months ago
34
151.72 MB
0

Opera Performances

Opera performances and associated data (Composers, Year written, etc)

Other
11 months ago
1
618.08 kB
0

Evol-Instruct-Code-80k-v1

Instructional code snippets with corresponding outputs

Other
11 months ago
1
53.72 MB
0

DailyDialog (Multi-turn Dialog)

Dialogues that reflect our daily communication way and cover various topics

Other
11 months ago
3
4.13 MB
0

Belgian Statutory Article Retrieval Dataset

Legal Q&A Dataset for Law Information Retrieval

Other
11 months ago
3
4.01 MB
0

Conversations On Coding, Debugging, Storytelling

Conversations on Coding, Debugging, Storytelling & Science

Other
11 months ago
1
2.21 MB
0

ProsocialDialog - Problematic Content Dialogue

Teach conversation agents to respond to problematic topics

Other
11 months ago
3
40.71 MB
0

OpenBookQA (Multi-step Reasoning)

Multi-step Reasoning, Commonsense Knowledge, and Rich Text Comprehension

Other
11 months ago
6
1.37 MB
0

Glaive Python Code QA Dataset

Supporting Intelligent Development of Code Assistants

Other
11 months ago
1
102.52 MB
0

Recipes Dataset

Recipes Dataset for NLP

Other
11 months ago
2
632.41 kB
0

Python Code Instruction

Training Data with Instruction, Input, Output, and Prompt Columns

Other
11 months ago
1
11.13 MB
0

Coding Questions With Solutions

Introductory, Interview and Competition Levels

Other
11 months ago
2
788.73 MB
0

Timeline Of Historical Pandemics

Tracing the Past to Prevent the Future

Other
11 months ago
9
80.14 kB
0

Synthetic Therapy Conversations

Synthetic Therapy Conversations

Other
11 months ago
1
210.39 MB
0

Google Stadia Games

Games released for google stadia

Other
11 months ago
2
35.11 kB
0

All GPT-4 Conversations

All chat datasets generated by GPT-4 from Huggingface in the same format

Other
11 months ago
27
1.39 GB
0

Weeds In Cultivation Fields

Ecology, Biogeography, and Red List Status

Other
11 months ago
1
96.2 kB
0

Apple's Historical Financials

Tracking a Decade of Performance

Other
11 months ago
1
10.53 kB
0

Data Breaches

30,000 Records of cyber-security data breaches

Other
11 months ago
1
19.75 kB
0

McDonalds Ice Cream Machines Breaking - Timeseries

Is the mcdonald’s ice cream machine broken? [locations & times]

Other
11 months ago
1
701 kB
0

World's Fairs

International universal exhibitions (expos) from 1851 - 2021

Other
11 months ago
1
17.7 kB
0

English Monarchs & Marriages

Names, ages, and marriages of English royals from 850 till current time

Other
11 months ago
1
6.64 kB
0

Paleobiology

The Paleobiology Database is a public database of paleontological data

Other
11 months ago
1
57.66 MB
0

Wikipedia Molecules

All molecules from Wikipedia articles with their molecular properties.

Other
11 months ago
1
1.89 MB
0

US Prisons

The prison boundary feature class contains secure detention facilities

Other
11 months ago
1
3.56 MB
0

Trump Charges

All 91 charges in the 4 indictments of former US President Donald Trump

Other
11 months ago
1
5.69 kB
0

Tornado Tracks

Tornado tracks in the US, Puerto Rico, and the U.S Virgin Islands from 1950-2013

Other
11 months ago
1
10.08 MB
0

US Federal Holidays

Which days of the week do federal holidays fall on this year?

Other
11 months ago
2
15.51 kB
0

Trillions Of Dollars

Can you visualise how much a trillion dollars is?

Other
11 months ago
5
75.31 kB
0

Lisa's Vegetable Garden

What changed in Lisa's vegetable garden between 2020 and 2021?

Other
11 months ago
6
46.38 kB
0

Room Occupancy Estimation

Estimate the precise number of occupants in a room using multiple env. sensors

Other
11 months ago
1
156.96 kB
0

Executive Orders

All Executive Orders by US Presidents issued since 1994

Other
11 months ago
1
178.96 kB
0

Leap Days

Which cohort of leap day births is most represented in Wikipedia's data?

Other
11 months ago
3
21.28 kB
0

Trash Wheel Collection Data

What type of trash is collected the most by the trash wheel?

Other
11 months ago
1
37.24 kB
0

National Poll On Healthy Aging (NPHA)

A subset of the NPHA dataset filtered down to develop and validate ML algorithms

Other
11 months ago
1
15.08 kB
0

Share link

Anyone who has the link will be able to view this.