Baselight

IMDb Movies

IMDb Movies Dataset (Sorted by popularity)

@kaggle.elvinrustam_imdb_movies_dataset

About this Dataset

IMDb Movies

IMDb Movies Dataset

Start Date: November 29, 2023

Finish Date: December 1, 2023

This dataset was scraped based on the popularity of IMDb movies (highest to lowest popularity).

There are total 9083 movies in the dataset.

!UNCLEAN VERSION: IMDbMovies

About Features:

Title: The name of the movie.

Summary: A brief overview of the movie's plot.

Director: The person responsible for overseeing the creative aspects of the film.

Writer: The individual who crafted the screenplay and story for the movie.

Main Genres: The primary categories or styles that the movie falls under.

Motion Picture Rating: The age-appropriate classification for viewers.

*Motion Picture Rating Categories: *

  • G (General Audience): Suitable for all ages; no offensive content.

  • PG (Parental Guidance): May contain mild language, violence, or thematic elements; parental guidance advised.

  • PG-13 (Parents Strongly Cautioned): Some material may be inappropriate for those under 13; more intense violence, language, or suggestive content.

  • R (Restricted): Restricted to viewers over 17 or 18; may contain adult themes, strong language, sexual content, or violence.

  • NC-17 (Adults Only): Restricted to adults 17 and older; may contain explicit sexual content or graphic violence.

Runtime: The total duration of the movie.

Release Year: The year in which the movie was officially released.

Rating: The average score given to the movie by viewers.

Number of Ratings: The total count of ratings submitted by viewers.

Budget: The estimated cost of producing the movie.

Gross in US & Canada: The total earnings from the movie's screening in the United States and Canada.

Gross worldwide: The overall worldwide earnings of the movie.

Opening Weekend Gross in US & Canada: The amount generated during the initial weekend of the movie's release in the United States and Canada.

!CLEAN VERSION: IMDbMovies-Clean

What I did:

  • I keep all missing values. Most of the cases missing values stem from lack of information in the website. There is few cases missing values stem from scraper. For example: Some movies will release in 2024 and there are no runtimes and ratings for these movies.

  • I changed the syntax of the 'Runtime', 'Rating', 'Number of Ratings', 'Budget', 'Gross in US & Canada', 'Gross worldwide', and 'Opening Weekend Gross in US & Canada' columns.

  • In some cases, I utilized the information from a single column to create two separate columns.

Share link

Anyone who has the link will be able to view this.