About this Dataset

Top 10000 Popular Movies Dataset

Context

Recommendation systems are used everywhere now a days. Netflix , Amazon Prime , YouTube , Online shopping sites etc. Datasets like this are great way to start working on Recommendation system.
The Dataset was created from the official API provied by TMDB

Content

What's inside is more than just rows and columns. This is the dataset for 10000 Popular movies based on the TMDB ratings. Ideal database to start off with Recommendation algorithms.

Column Name	Description
id	Every movie has its unique ID.
original_language	There are total 44 languages present in this column. Total 7771 movies with 'English' as original language. Values in this column are ISO 639-1 codes of languages. I.e 'en' for 'English' , 'hi' for 'Hindi' etc.
original_title	Title of the movie.
popularity	Popularity of movie. Bigger the number , higher the popularity.
release_date	Release date of the movie. If release date is not present for any movie , then that movie is not released yet.
vote_average	Average of rating/vote for the movie.
vote_count	Number of ratings/vote recorded for the movie.
genre	Genre of the movie.
overview	Brief description of movie in string format.
revenue	Revenue of Movie
runtime	Runtime of movie in minutes.
tagline	Tagline of the movie

Origin

The code which was used to extract this dataset can be found here - Creating Dataset of top 10000 popular movies

Update

Added Overview , Revenue , Runtime, tagline column for each movie.

Tables

Top 10000 Movies

@kaggle.omkarborikar_top_10000_popular_movies.top_10000_movies

2.61 MB
10014 rows
13 columns


CREATE TABLE top_10000_movies (
  "unnamed_0" VARCHAR,
  "id" DOUBLE,
  "original_language" VARCHAR,
  "original_title" VARCHAR,
  "popularity" DOUBLE,
  "release_date" TIMESTAMP,
  "vote_average" DOUBLE,
  "vote_count" DOUBLE,
  "genre" VARCHAR,
  "overview" VARCHAR,
  "revenue" DOUBLE,
  "runtime" DOUBLE,
  "tagline" VARCHAR
);