7k Books
@kaggle.dylanjcastillo_7k_books_with_metadata
@kaggle.dylanjcastillo_7k_books_with_metadata
My initial plan was to build a toy example for a recommender system article I was writing. After a bit of googling, I found a few datasets. Sadly, most of them had some issues that made them unusable for me (e.g, missing description of the book, a mix of different languages but no column to specify the language per row or weird delimiters).
So I decided to make a dataset that would match my purposes.
First, I got ISBNs from Soumik's Goodreads-books dataset. Using those identifiers, I crawled the Google Books API to extract the books' information.
Then, I merged those results with some of the original columns from the dataset and after some cleaning I got the dataset you see here.
Different Exploratory Data Analysis, clustering of books by topics/category, content-based recommendation engine using different fields from the book's description.
Many of the ISBNs of that dataset did not return valid results from the Google Books API. I plan to update this in the future, using more fields (e.g., title, author) in the API requests, as to have a bigger dataset.
Check out the repoistory here Google Books Crawler
This dataset relied heavily on Soumik's Goodreads-books dataset.
@kaggle
@owid
Share link
Anyone who has the link will be able to view this.