This is a collection of metadata about the top 10,000 most popular movies on The Movie Database (TMDB) . The dataset includes information such as movie titles, release dates, runtime, genres, production companies, budget, and revenue. This data is collected from TMDB's public API using a notebook available here.
Little bit about TMDB
TMDB (The Movie Database) is a popular online database and community platform that provides a vast collection of information about movies, TV shows, and other related content. TMDB allows users to browse and search for movies and TV shows, view information such as cast, crew, synopsis, and ratings, and also contribute to the community by adding their own reviews, ratings, and other content.
Purpose
The dataset is intended for use by data analysts, researchers, and developers who are interested in studying or analyzing the popularity and characteristics of movies. The dataset can be used to perform a wide range of analyses, such as exploring trends in movie genres over time, identifying patterns in movie budgets and revenues, and analyzing the impact of different attributes on a movie's popularity.
Attributes
- id: Unique identifier assigned to each movie in the TMDB database.
- title: Title of the movie.
- release_date: Date on which the movie was released.
- genres: List of genres associated with the movie.
- original_language: Language in which the movie was originally produced.
- vote_average: Average rating given to the movie by TMDB users.
- vote_count: Number of votes cast for the movie on TMDB.
- popularity: Popularity score assigned to the movie by TMDB based on user engagement.
- overview: Brief description or synopsis of the movie.
- budget: Estimated budget for producing the movie in USD.
- production_companies: List of production companies involved in making the movie.
- revenue: Total revenue generated by the movie in USD.
- runtime: Total runtime of the movie in minutes.
- tagline: Short, memorable phrase associated with the movie, often used in promotional material.
The dataset mentioned has been created by fetching raw data from TMDB's public API, and then cleaning and preprocessing the data to improve its quality and make it easier to work with. The cleaning process has been done using a notebook available here, which outlines the steps taken to transform the raw data into a more usable format.