Baselight

IMDb Movies

IMDb Movies Dataset (Sorted by popularity)

@kaggle.elvinrustam_imdb_movies_dataset

Loading...
Loading...

About this Dataset

IMDb Movies

IMDb Movies Dataset

Start Date: November 29, 2023

Finish Date: December 1, 2023

This dataset was scraped based on the popularity of IMDb movies (highest to lowest popularity).

There are total 9083 movies in the dataset.

!UNCLEAN VERSION: IMDbMovies

About Features:

Title: The name of the movie.

Summary: A brief overview of the movie's plot.

Director: The person responsible for overseeing the creative aspects of the film.

Writer: The individual who crafted the screenplay and story for the movie.

Main Genres: The primary categories or styles that the movie falls under.

Motion Picture Rating: The age-appropriate classification for viewers.

*Motion Picture Rating Categories: *

  • G (General Audience): Suitable for all ages; no offensive content.

  • PG (Parental Guidance): May contain mild language, violence, or thematic elements; parental guidance advised.

  • PG-13 (Parents Strongly Cautioned): Some material may be inappropriate for those under 13; more intense violence, language, or suggestive content.

  • R (Restricted): Restricted to viewers over 17 or 18; may contain adult themes, strong language, sexual content, or violence.

  • NC-17 (Adults Only): Restricted to adults 17 and older; may contain explicit sexual content or graphic violence.

Runtime: The total duration of the movie.

Release Year: The year in which the movie was officially released.

Rating: The average score given to the movie by viewers.

Number of Ratings: The total count of ratings submitted by viewers.

Budget: The estimated cost of producing the movie.

Gross in US & Canada: The total earnings from the movie's screening in the United States and Canada.

Gross worldwide: The overall worldwide earnings of the movie.

Opening Weekend Gross in US & Canada: The amount generated during the initial weekend of the movie's release in the United States and Canada.

!CLEAN VERSION: IMDbMovies-Clean

What I did:

  • I keep all missing values. Most of the cases missing values stem from lack of information in the website. There is few cases missing values stem from scraper. For example: Some movies will release in 2024 and there are no runtimes and ratings for these movies.

  • I changed the syntax of the 'Runtime', 'Rating', 'Number of Ratings', 'Budget', 'Gross in US & Canada', 'Gross worldwide', and 'Opening Weekend Gross in US & Canada' columns.

  • In some cases, I utilized the information from a single column to create two separate columns.

Tables

Imdbmovies Clean

@kaggle.elvinrustam_imdb_movies_dataset.imdbmovies_clean
  • 1.57 MB
  • 9083 rows
  • 15 columns
Loading...

CREATE TABLE imdbmovies_clean (
  "title" VARCHAR,
  "summary" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "main_genres" VARCHAR,
  "motion_picture_rating" VARCHAR,
  "release_year" DOUBLE,
  "runtime_minutes" DOUBLE,
  "rating_out_of_10" DOUBLE,
  "number_of_ratings_in_thousands" DOUBLE,
  "budget_in_millions" DOUBLE,
  "gross_in_us_canada_in_millions" DOUBLE,
  "gross_worldwide_in_millions" DOUBLE,
  "opening_weekend_in_us_canada" VARCHAR,
  "gross_opening_weekend_in_millions" DOUBLE
);

Imdbmovies

@kaggle.elvinrustam_imdb_movies_dataset.imdbmovies
  • 1.67 MB
  • 9083 rows
  • 14 columns
Loading...

CREATE TABLE imdbmovies (
  "title" VARCHAR,
  "summary" VARCHAR,
  "director" VARCHAR,
  "writer" VARCHAR,
  "main_genres" VARCHAR,
  "motion_picture_rating" VARCHAR,
  "runtime" VARCHAR,
  "release_year" DOUBLE,
  "rating" VARCHAR,
  "number_of_ratings" VARCHAR,
  "budget" VARCHAR,
  "gross_in_us_canada" VARCHAR,
  "gross_worldwide" VARCHAR,
  "opening_weekend_gross_in_us_canada" VARCHAR
);

Share link

Anyone who has the link will be able to view this.