Sciphi Textbooks Are All You Need by Kaggle | Demographics and Population Studies

About this Dataset

Sciphi Textbooks Are All You Need

650,000 Unique Samples from K-12 to Grad School

By Huggingface Hub [source]

About this dataset

This dataset is your one-stop comprehensive resource for educational research. Featuring 650,000 unique textbook samples on a wide range of courses from the earliest days of K-12 to the most advanced graduate programs, dive deep into the educational ecosystem with an expansive library built for exploration and discovery.

Analyze course materials with confidence, examining their nuances through different perspectives and learning styles by leveraging prompted samples, completed versions, and even notes left by fellow researchers. And take your projects one step further with adjustable parameters such as models used and temperature settings aiding in optimization of results tailored to your work.

Whether you are trainer seeking fresh curriculum ideas or a student looking for primary source materials in history or literature classes, our open-source collection handles it all—one million pages strong!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This comprehensive open-source textbook library for educational research is an invaluable and expansive resource for researchers, educators, and students alike. With 650,000 unique samples from K-12 to graduate school academic levels across a variety of courses, this dataset provides critical insights into the vast array of educational material available.

In order to use this dataset, there are several key columns to consider: formatted_prompt, completion, first_task, second_task, last_task , notes , title , model , and temperature . Each column contains valuable information that can help you better understand the sample textbooks included in the dataset. For example:
-Formatted Prompt: The original prompt used to generate a given sample of textbook text.
-Completion: The generated results from a given prompt based on the model used (the higher the temperature used when generating text output will result in more varied sentences).
-Tasks: Each task corresponds with separate portions of a process that were completed (e.g.: first_task may have generated an introduction paragraph while last task may have summarized certain key points identified in earlier tasks).
-Notes & Title : These two columns provide descriptive meta data about each sample including expert notes regarding further improvements or other additions that could be made as well as titles assigned by subject matter experts.

With accessibility to such informative data points users will be able to reproduce results or even start their own exploration using one cohesive dataset for all their drafting / programming needs!

Research Ideas

Text classification for automatically assigning courses and topics to a given body of text.

Generating natural language summaries of textbooks or educational material, such as short document descriptors for search engine optimization (SEO) purposes.

Devising new tasks for which to train machine learning models, such as predicting the completed form of incomplete sentences in order to facilitate more accurate auto-fill capabilities when composing documents

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv

Column name	Description
formatted_prompt	A prompt that has been formatted for use in the dataset. (String)
completion	The completion of the prompt. (String)
first_task	The first task associated with the prompt. (String)
second_task	The second task associated with the prompt. (String)
last_task	The last task associated with the prompt. (String)
notes	Any additional notes associated with the prompt. (String)
title	The title of the prompt. (String)
model	The model used to generate the prompt. (String)
temperature	The temperature used to generate the prompt. (Float)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.

Tables

Train

@kaggle.thedevastator_open_source_educational_textbook_library.train

1.17 GB
681845 rows
9 columns


CREATE TABLE train (
  "formatted_prompt" VARCHAR,
  "completion" VARCHAR,
  "first_task" VARCHAR,
  "second_task" VARCHAR,
  "last_task" VARCHAR,
  "notes" VARCHAR,
  "title" VARCHAR,
  "model" VARCHAR,
  "temperature" DOUBLE
);