Baselight

IMDb Genre-wise Movies Dataset And Sparse Matrices

Contains 75k+ movies, collected from IMDb website: 22 genres + 1 master dataset.

@kaggle.soumyasacharya_imdb_movies_dataset

About this Dataset

IMDb Genre-wise Movies Dataset And Sparse Matrices

Context

IMDb stores information related to more than 6 million titles (of which almost 500,000 are featured films) and it is owned by Amazon since 1998.

Content

The movies' Master dataset includes about 75k movies with attributes such as movie description, average rating, number of votes, genre, etc.

The movies are also divided according to genres, which makes a total of 22 genres. The master dataset, containing all the movies is the file all_df.csv

Each dataset has it's sparse matrix, containing all the Tf-idf scores after applying TfidfVectorizer[analyzer='word, ngram_range=(1,3), stopwords='english']

Data has been scraped from the publicly available website https://www.imdb.com .

Share link

Anyone who has the link will be able to view this.