Dataset of books with title, author, description, rating, thumbnail, and more

Do we really need another dataset of books?

My initial plan was to build a toy example for a recommender system article I was writing. After a bit of googling, I found a few datasets. Sadly, most of them had some issues that made them unusable for me (e.g, missing description of the book, a mix of different languages but no column to specify the language per row or weird delimiters).

So I decided to make a dataset that would match my purposes.

First, I got ISBNs from Soumik's Goodreads-books dataset. Using those identifiers, I crawled the Google Books API to extract the books' information.

Then, I merged those results with some of the original columns from the dataset and after some cleaning I got the dataset you see here.

What can I do with this?

Different Exploratory Data Analysis, clustering of books by topics/category, content-based recommendation engine using different fields from the book's description.

Why is this dataset smaller than Soumik's Goodreads-books?

Many of the ISBNs of that dataset did not return valid results from the Google Books API. I plan to update this in the future, using more fields (e.g., title, author) in the API requests, as to have a bigger dataset.

What did you use to build this dataset?

Check out the repoistory here Google Books Crawler

Acknowledgements

This dataset relied heavily on Soumik's Goodreads-books dataset.

Related Datasets

Popular Books Dataset

@kaggle
Global Forest Resources Assessment

@owid
Trust Questions In The European Social Survey, Latinobarómetro And Afrobarometer

@owid
Wars On Territory

@owid
Biodiversity Habitat Loss (Williams Et Al. 2021)

@owid
Nuclear Weapons Proliferation

@owid

Popular Books Dataset

Global Forest Resources Assessment

Trust Questions In The European Social Survey, Latinobarómetro And Afrobarometer

Wars On Territory

Biodiversity Habitat Loss (Williams Et Al. 2021)

Nuclear Weapons Proliferation