Baselight

Japanese Anime: An In-Depth IMDb Data Set

Unlocking Insights into Popularity, Ratings, and Trends in Japanese Animation

@kaggle.lorentzyeung_all_japanese_anime_titles_in_imdb

About this Dataset

Japanese Anime: An In-Depth IMDb Data Set

Introduction to the IMDb Anime Dataset (45718 titles)

Methodology

The dataset is fetched on 8 Sept, 2023, at 18:00 pm London time.

The dataset was generated using a web scraping script written in Python, utilizing the Scrapy library. The script navigates through IMDb's list of animations originating from Japan, scraping relevant information from each listing. The spider starts from the URL https://www.imdb.com/search/title/?genres=Animation&countries=jp and follows the "Next" links to traverse through multiple pages of listings.

Summary of Results

The dataset provides a comprehensive view of various animations listed on IMDb that are categorized under the genre "Animation" and originate from Japan. It includes details such as the title, genre, user rating, number of votes, runtime, year of release, summary, stars, certificate, metascore, gross earnings, episode flag, and episode title when applicable.

However, the dataset also includes some animations not regarded as Japanese Anime, e.g. Toy Storys.
It is because I can only filter the Anime by using regions, but the origin of production.

Detailed Column Introduction

Title: The name of the animation.
Genre: The genre(s) under which the animation falls, e.g., Action, Adventure, etc.
User Rating: The IMDb user rating out of 10.
Number of Votes: The total number of IMDb users who have rated the animation.
Runtime: The duration of the animation in minutes.
Year: The year the animation was released or started airing.
Summary: A brief or full summary of the animation's plot. Full summaries are fetched when available.
Stars: List of main actors or voice actors involved in the animation.
Certificate: The certification of the animation, e.g., PG, PG-13, etc.
Metascore: The Metascore rating, if available, which is an aggregated score from various critics.
Gross: The gross earnings or box office collection of the animation.
Episode: A binary flag indicating whether the listing is for an episode of a series (1 for yes, 0 for no).
Episode Title: The title of the episode if the listing is for an episode; otherwise, it will be None.

Possible Usages

Exploratory Data Analysis (EDA)
Genre Popularity: Analyze which genres are most popular based on user ratings and number of votes.
Year-wise Trends: Examine how the popularity of anime has evolved over the years.

Predictive Modeling
Rating Prediction: Use machine learning algorithms to predict the rating of an anime based on features like genre, runtime, and stars.
Success Prediction: Predict the financial success (Gross earnings) of an anime based on various features.

Content Recommendation
Personalized Recommendations: Use user ratings and genre information to build a recommendation system.

Sentiment Analysis
Summary Sentiment: Perform sentiment analysis on the summary to see if the tone of the summary correlates with user ratings or other features.

**Network Analysis
Actor Collaboration: Create a network graph to analyze frequent collaborations between actors.

Time-Series Analysis
Rating Over Time: Analyze how ratings evolve over time for long-running series.

Market Research
Target Audience: Use the certificate and genre information to identify target demographics for marketing anime-related products.

Academic Research
Cultural Impact: Study the cultural impact of anime by analyzing its popularity, genres, and actors.

Data Visualization
Interactive Dashboards: Create dashboards to visualize the data and allow users to filter by various criteria like genre, year, or rating.

Natural Language Processing (NLP)
Topic Modeling: Use NLP techniques to identify common themes or topics in the summaries.

By leveraging Python for data analysis, you can use libraries like Pandas for data manipulation, Matplotlib and Seaborn for data visualization, and scikit-learn for machine learning to extract valuable insights from this dataset.

Share link

Anyone who has the link will be able to view this.