7k Books
Dataset of books with title, author, description, rating, thumbnail, and more
@kaggle.dylanjcastillo_7k_books_with_metadata
Dataset of books with title, author, description, rating, thumbnail, and more
@kaggle.dylanjcastillo_7k_books_with_metadata
My initial plan was to build a toy example for a recommender system article I was writing. After a bit of googling, I found a few datasets. Sadly, most of them had some issues that made them unusable for me (e.g, missing description of the book, a mix of different languages but no column to specify the language per row or weird delimiters).
So I decided to make a dataset that would match my purposes.
First, I got ISBNs from Soumik's Goodreads-books dataset. Using those identifiers, I crawled the Google Books API to extract the books' information.
Then, I merged those results with some of the original columns from the dataset and after some cleaning I got the dataset you see here.
Different Exploratory Data Analysis, clustering of books by topics/category, content-based recommendation engine using different fields from the book's description.
Many of the ISBNs of that dataset did not return valid results from the Google Books API. I plan to update this in the future, using more fields (e.g., title, author) in the API requests, as to have a bigger dataset.
Check out the repoistory here Google Books Crawler
This dataset relied heavily on Soumik's Goodreads-books dataset.
CREATE TABLE books (
"isbn13" BIGINT,
"isbn10" VARCHAR,
"title" VARCHAR,
"subtitle" VARCHAR,
"authors" VARCHAR,
"categories" VARCHAR,
"thumbnail" VARCHAR,
"description" VARCHAR,
"published_year" DOUBLE,
"average_rating" DOUBLE,
"num_pages" DOUBLE,
"ratings_count" DOUBLE
);Anyone who has the link will be able to view this.