Baselight

IMDb Genre-wise Movies Dataset And Sparse Matrices

Contains 75k+ movies, collected from IMDb website: 22 genres + 1 master dataset.

@kaggle.soumyasacharya_imdb_movies_dataset

Loading...
Loading...

About this Dataset

IMDb Genre-wise Movies Dataset And Sparse Matrices

Context

IMDb stores information related to more than 6 million titles (of which almost 500,000 are featured films) and it is owned by Amazon since 1998.

Content

The movies' Master dataset includes about 75k movies with attributes such as movie description, average rating, number of votes, genre, etc.

The movies are also divided according to genres, which makes a total of 22 genres. The master dataset, containing all the movies is the file all_df.csv

Each dataset has it's sparse matrix, containing all the Tf-idf scores after applying TfidfVectorizer[analyzer='word, ngram_range=(1,3), stopwords='english']

Data has been scraped from the publicly available website https://www.imdb.com .

Tables

Action Df

@kaggle.soumyasacharya_imdb_movies_dataset.action_df
  • 4.35 MB
  • 11562 rows
  • 24 columns
Loading...

CREATE TABLE action_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" BIGINT,
  "date_published" TIMESTAMP,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" DOUBLE,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

Adventure Df

@kaggle.soumyasacharya_imdb_movies_dataset.adventure_df
  • 2.63 MB
  • 6748 rows
  • 24 columns
Loading...

CREATE TABLE adventure_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" BIGINT,
  "date_published" TIMESTAMP,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" DOUBLE,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

All Df

@kaggle.soumyasacharya_imdb_movies_dataset.all_df
  • 28.1 MB
  • 74889 rows
  • 24 columns
Loading...

CREATE TABLE all_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" VARCHAR,
  "date_published" VARCHAR,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" DOUBLE,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

Animation Df

@kaggle.soumyasacharya_imdb_movies_dataset.animation_df
  • 777.21 KB
  • 1881 rows
  • 24 columns
Loading...

CREATE TABLE animation_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" BIGINT,
  "date_published" TIMESTAMP,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" DOUBLE,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

Biography Df

@kaggle.soumyasacharya_imdb_movies_dataset.biography_df
  • 773.2 KB
  • 1781 rows
  • 24 columns
Loading...

CREATE TABLE biography_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" VARCHAR,
  "date_published" VARCHAR,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" DOUBLE,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

Comedy Df

@kaggle.soumyasacharya_imdb_movies_dataset.comedy_df
  • 9.75 MB
  • 25200 rows
  • 24 columns
Loading...

CREATE TABLE comedy_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" VARCHAR,
  "date_published" VARCHAR,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" DOUBLE,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

Crime Df

@kaggle.soumyasacharya_imdb_movies_dataset.crime_df
  • 3.67 MB
  • 9715 rows
  • 24 columns
Loading...

CREATE TABLE crime_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" VARCHAR,
  "date_published" VARCHAR,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" DOUBLE,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

Documentary Df

@kaggle.soumyasacharya_imdb_movies_dataset.documentary_df
  • 18.14 KB
  • 1 row
  • 24 columns
Loading...

CREATE TABLE documentary_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" BIGINT,
  "date_published" TIMESTAMP,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" VARCHAR,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

Drama Df

@kaggle.soumyasacharya_imdb_movies_dataset.drama_df
  • 15.54 MB
  • 41004 rows
  • 24 columns
Loading...

CREATE TABLE drama_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" BIGINT,
  "date_published" TIMESTAMP,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" DOUBLE,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

Family Df

@kaggle.soumyasacharya_imdb_movies_dataset.family_df
  • 1.42 MB
  • 3487 rows
  • 24 columns
Loading...

CREATE TABLE family_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" BIGINT,
  "date_published" TIMESTAMP,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" DOUBLE,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

Fantasy Df

@kaggle.soumyasacharya_imdb_movies_dataset.fantasy_df
  • 1.36 MB
  • 3356 rows
  • 24 columns
Loading...

CREATE TABLE fantasy_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" BIGINT,
  "date_published" TIMESTAMP,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" DOUBLE,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

Film Noir Df

@kaggle.soumyasacharya_imdb_movies_dataset.film_noir_df
  • 196.77 KB
  • 578 rows
  • 24 columns
Loading...

CREATE TABLE film_noir_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" BIGINT,
  "date_published" TIMESTAMP,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" DOUBLE,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

History Df

@kaggle.soumyasacharya_imdb_movies_dataset.history_df
  • 845.19 KB
  • 1970 rows
  • 24 columns
Loading...

CREATE TABLE history_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" BIGINT,
  "date_published" TIMESTAMP,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" DOUBLE,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

Horror Df

@kaggle.soumyasacharya_imdb_movies_dataset.horror_df
  • 3.17 MB
  • 8611 rows
  • 24 columns
Loading...

CREATE TABLE horror_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" BIGINT,
  "date_published" TIMESTAMP,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" DOUBLE,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

Musical Df

@kaggle.soumyasacharya_imdb_movies_dataset.musical_df
  • 676.7 KB
  • 1755 rows
  • 24 columns
Loading...

CREATE TABLE musical_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" BIGINT,
  "date_published" TIMESTAMP,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" DOUBLE,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

Music Df

@kaggle.soumyasacharya_imdb_movies_dataset.music_df
  • 209.18 KB
  • 460 rows
  • 24 columns
Loading...

CREATE TABLE music_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" BIGINT,
  "date_published" TIMESTAMP,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" DOUBLE,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

Mystery Df

@kaggle.soumyasacharya_imdb_movies_dataset.mystery_df
  • 1.8 MB
  • 4701 rows
  • 24 columns
Loading...

CREATE TABLE mystery_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" BIGINT,
  "date_published" TIMESTAMP,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" DOUBLE,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

Romance Df

@kaggle.soumyasacharya_imdb_movies_dataset.romance_df
  • 4.67 MB
  • 12376 rows
  • 24 columns
Loading...

CREATE TABLE romance_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" BIGINT,
  "date_published" TIMESTAMP,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" DOUBLE,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

Sci Fi Df

@kaggle.soumyasacharya_imdb_movies_dataset.sci_fi_df
  • 1.26 MB
  • 3261 rows
  • 24 columns
Loading...

CREATE TABLE sci_fi_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" BIGINT,
  "date_published" TIMESTAMP,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" DOUBLE,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

Sport Df

@kaggle.soumyasacharya_imdb_movies_dataset.sport_df
  • 393.4 KB
  • 913 rows
  • 24 columns
Loading...

CREATE TABLE sport_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" BIGINT,
  "date_published" TIMESTAMP,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" DOUBLE,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

Thriller Df

@kaggle.soumyasacharya_imdb_movies_dataset.thriller_df
  • 3.88 MB
  • 10248 rows
  • 24 columns
Loading...

CREATE TABLE thriller_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" BIGINT,
  "date_published" TIMESTAMP,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" DOUBLE,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

War Df

@kaggle.soumyasacharya_imdb_movies_dataset.war_df
  • 814.19 KB
  • 1998 rows
  • 24 columns
Loading...

CREATE TABLE war_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" BIGINT,
  "date_published" TIMESTAMP,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" DOUBLE,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

Western Df

@kaggle.soumyasacharya_imdb_movies_dataset.western_df
  • 511.31 KB
  • 1438 rows
  • 24 columns
Loading...

CREATE TABLE western_df (
  "imdb_title_id" VARCHAR,
  "title" VARCHAR,
  "original_title" VARCHAR,
  "year" BIGINT,
  "date_published" TIMESTAMP,
  "genre" VARCHAR,
  "duration" BIGINT,
  "country" VARCHAR,
  "language" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "production_company" VARCHAR,
  "actors" VARCHAR,
  "description" VARCHAR,
  "avg_vote" DOUBLE,
  "votes" BIGINT,
  "budget" VARCHAR,
  "usa_gross_income" VARCHAR,
  "worlwide_gross_income" VARCHAR,
  "metascore" DOUBLE,
  "reviews_from_users" DOUBLE,
  "reviews_from_critics" DOUBLE,
  "description_words" BIGINT,
  "movie_title" VARCHAR
);

Share link

Anyone who has the link will be able to view this.