Baselight

Anime-Planet Recommendation Database 2020

Recommendation data from 74.000 users and 16.000 animes at Anime-Planet

@kaggle.hernan4444_animeplanet_recommendation_database_2020

About this Dataset

Anime-Planet Recommendation Database 2020

Anime-Planet Recommendation Database 2020

Recommendation data from 74.000 users and 16.000 animes at Anime-Planet

This dataset contains information about 16.621 anime, 175.731 recommendations and the preference from 74.129 different users of animes scrapped from anime-planet. In particular, this dataset contain:

  • Information about the anime like Tags, synopsis, average score, etc.
  • List of animes recommended given another anime and the count of user that are agreed with the recommendation.
  • HTML with anime information to do data scrapping. These files contain information such as reviews, synopsis, information about the staff, anime statistics, genre, etc.
  • the anime list per user. Include dropped, watched, want to watch, currently watching, stalled and Won't watch.
  • ratings given by users to the animes that they has watched completely.

Warning: this dataset includes information about anime for adults (hentai).

Content

The anime data was scrapped between June 4th and June 25th.

  • The "html" folder contain 1 zip per anime (16.621 different anime). Each zip contains different HTML pages scrapped from Anime-planet. The scrapped pages are:
  1. Main page
  2. Reviews
  3. Recommendations
  4. Characters
  5. Staff

I uploaded 2 files as example to don't increase the size of this dataset. All HTML files are in this link: https://drive.google.com/drive/folders/1xIxBRtJR2oTZhJVvjFoTo3qllBFn4aOV?usp=sharing

  • animelist.csv have the list of all animes register by the user with the respective score, watching status and numbers of episodes watched. This dataset contains 20 Million row, 16.745 different animes and 74.129 different users. The file have the following columns:
  1. user_id: non identifiable randomly generated user id.
  2. anime_id: Anime-planet ID of the anime. (e.g. 1).
  3. score: score between 1 to 5 given by the user in scale of 0.5. 0 if the user didn't assign a score. (e.g. 3.5)
  4. watching_status: state ID from this anime in the anime list of this user. (e.g. 2)
  5. watched_episodes: numbers of episodes watched by the user. (e.g. 24)
  • watching_status.csv describe every possible status of the column: "watching_status" in animelist.csv.

  • rating_complete.csv is a subset of animelist.csv. This dataset only considers animes that the user has watched completely (watching_status==1) and gave it a score (score!=0). This dataset contains 8 Million ratings applied to 15.681 animes by 68.199 users. This file have the following columns:

  1. user_id: non identifiable randomly generated user id.
  2. anime_id: Anime-planet ID of the anime. (e.g. 1).
  3. rating: rating that this user has assigned.
  1. Anime: Anime Planet ID of the anime. (e.g. 1).
  2. Recommendation: Anime Planet ID of the recommended anime. (e.g. 1).
  3. Agree Votes: number of users that was agreed with the recommendation.
  • anime.csv contain general information of every anime (16.621 different anime) like Tags, type, studio, synopsis, etc. This file have the following columns:
  1. Anime-PlanetID: Anime Planet ID of the anime. (e.g. 1).
  2. Name: full name of the anime. (e.g. FLCL)
  3. Alternative Name: another way to call the anime. (e.g. Furi Kuri)
  4. Rating Score: average score of the anime given from all users in Anime Planet database. (e.g. 8.78)
  5. Number Votes: number of users who give a score to the anime. (e.g. 1241)
  6. Tags: comma separated list of tags for this anime. (e.g. Comedy, Mecha, Sci Fi, Outer Space, Original Work)
  7. Content Warning: comma separated list of content warning tags. (e.g. Explicit Violence, Mature Themes, Nudity)
  8. Type: TV, movie, OVA, etc. (e.g. TV).
  9. Episodes: number of chapters. (e.g. 26)
  10. Finished: True if the anime finished when I did the data scraping. False is the anime is on going in that moment.
  11. Duration: duration of the anime in minutes (e.g 60)
  12. StartYear: year when the anime start the transmission. (e.g. 2016)
  13. EndYear: year when the anime finish the transmission. (e.g. 2017)
  14. Season: season and year of release (e.g. Fall 2000)
  15. Studios: comma separated list of studios (e.g. Sunrise)
  16. Synopsis: synopsis of the anime
  17. Url: url to the main page of anime in Anime Planet (e.g. https://www.anime-planet.com/anime/vandread)

Acknowledgements

Thanks to:

  1. Anime Planet for providing anime data.

Inspiration

  1. Improve Anime Recommendation Database 2020 with more data like tags, content warning, another synopsis, etc.

  2. Experiment with different types of recommended. For instance, collaborative filtering or based on context like Tags, synopsis, etc.

  3. Use this information to build a better anime recommended system.

  4. Identifying which feature allows us to build the best anime recommended system.

  5. Build a second dataset with anime list per user.

Share link

Anyone who has the link will be able to view this.