Baselight

7k Books

Dataset of books with title, author, description, rating, thumbnail, and more

@kaggle.dylanjcastillo_7k_books_with_metadata

About this Dataset

7k Books

Do we really need another dataset of books?

My initial plan was to build a toy example for a recommender system article I was writing. After a bit of googling, I found a few datasets. Sadly, most of them had some issues that made them unusable for me (e.g, missing description of the book, a mix of different languages but no column to specify the language per row or weird delimiters).

So I decided to make a dataset that would match my purposes.

First, I got ISBNs from Soumik's Goodreads-books dataset. Using those identifiers, I crawled the Google Books API to extract the books' information.

Then, I merged those results with some of the original columns from the dataset and after some cleaning I got the dataset you see here.

What can I do with this?

Different Exploratory Data Analysis, clustering of books by topics/category, content-based recommendation engine using different fields from the book's description.

Why is this dataset smaller than Soumik's Goodreads-books?

Many of the ISBNs of that dataset did not return valid results from the Google Books API. I plan to update this in the future, using more fields (e.g., title, author) in the API requests, as to have a bigger dataset.

What did you use to build this dataset?

Check out the repoistory here Google Books Crawler

Acknowledgements

This dataset relied heavily on Soumik's Goodreads-books dataset.

Share link

Anyone who has the link will be able to view this.