MyAnimeList Database 2020
Recommendation data from 320.0000 users and 16.000 animes at myanimelist.net
This dataset contains information about 17.562 anime and the preference from 325.772 different users. In particular, this dataset contain:
- The anime list per user. Include dropped, complete, plan to watch, currently watching and on hold.
- Ratings given by users to the animes that they has watched completely.
- Information about the anime like genre, stats, studio, etc.
- HTML with anime information to do data scrapping. These files contain information such as reviews, synopsis, information about the staff, anime statistics, genre, etc.
Also, the code used to collect the data is available at github: https://github.com/Hernan4444/MyAnimeList-Database.
Warning: this dataset includes information about anime for adults (hentai).
Content
The data was scrapped between February 26th and March 20th.
- The "html" folder contain 1 zip per anime (17.562 different anime). Each zip contains different HTML pages scrapped from MyAnimeList. The scrapped pages are:
- Main page
- Reviews
- Recommendations
- Stats
- Characters & Staff
I uploaded 2 files as example to don't increase the size of this dataset. All HTML files are in this link: https://drive.google.com/drive/folders/12ghJk-sWyXXORoLBUpPirK4YdtIaZPV_?usp=sharing
animelist.csv
have the list of all animes register by the user with the respective score, watching status and numbers of episodes watched. This dataset contains 109 Million row, 17.562 different animes and 325.772 different users. The file have the following columns:
- user_id: non identifiable randomly generated user id.
- anime_id: MyAnimeList ID of the anime. (e.g. 1).
- score: score between 1 to 10 given by the user. 0 if the user didn't assign a score. (e.g. 10)
- watching_status: state ID from this anime in the anime list of this user. (e.g. 2)
- watched_episodes: numbers of episodes watched by the user. (e.g. 24)
-
watching_status.csv
describe every possible status of the column: "watching_status" in animelist.csv
.
-
rating_complete.csv
is a subset of animelist.csv
. This dataset only considers animes that the user has watched completely (watching_status==2
) and gave it a score (score!=0
). This dataset contains 57 Million ratings applied to 16.872 animes by 310.059 users. This file have the following columns:
- user_id: non identifiable randomly generated user id.
- anime_id: - MyAnimelist ID of the anime that this user has rated.
- rating: rating that this user has assigned.
anime.csv
contain general information of every anime (17.562 different anime) like genre, stats, studio, etc. This file have the following columns:
- MAL_ID: MyAnimelist ID of the anime. (e.g. 1)
- Name: full name of the anime. (e.g. Cowboy Bebop)
- Score: average score of the anime given from all users in MyAnimelist database. (e.g. 8.78)
- Genres: comma separated list of genres for this anime. (e.g. Action, Adventure, Comedy, Drama, Sci-Fi, Space)
- English name: full name in english of the anime. (e.g. Cowboy Bebop)
- Japanese name: full name in japanses of the anime. (e.g. カウボーイビバップ)
- Type: TV, movie, OVA, etc. (e.g. TV)
- Episodes': number of chapters. (e.g. 26)
- Aired: broadcast date. (e.g. Apr 3, 1998 to Apr 24, 1999)
- Premiered: season premiere. (e.g. Spring 1998)
- Producers: comma separated list of produducers (e.g. Bandai Visual)
- Licensors: comma separated list of licensors (e.g. Funimation, Bandai Entertainment)
- Studios: comma separated list of studios (e.g. Sunrise)
- Source: Manga, Light novel, Book, etc. (e.g Original)
- Duration: duration of the anime per episode (e.g 24 min. per ep.)
- Rating: age rate (e.g. R - 17+ (violence & profanity))
- Ranked: position based in the score. (e.g 28)
- Popularity: position based in the the number of users who have added the anime to their list. (e.g 39)
- Members: number of community members that are in this anime's "group". (e.g. 1251960)
- Favorites: number of users who have the anime as "favorites". (e.g. 61,971)
- Watching: number of users who are watching the anime. (e.g. 105808)
- Completed: number of users who have complete the anime. (e.g. 718161)
- On-Hold: number of users who have the anime on Hold. (e.g. 71513)
- Dropped: number of users who have dropped the anime. (e.g. 26678)
- Plan to Watch': number of users who plan to watch the anime. (e.g. 329800)
- Score-10': number of users who scored 10. (e.g. 229170)
- Score-9': number of users who scored 9. (e.g. 182126)
- Score-8': number of users who scored 8. (e.g. 131625)
- Score-7': number of users who scored 7. (e.g. 62330)
- Score-6': number of users who scored 6. (e.g. 20688)
- Score-5': number of users who scored 5. (e.g. 8904)
- Score-4': number of users who scored 4. (e.g. 3184)
- Score-3': number of users who scored 3. (e.g. 1357)
- Score-2': number of users who scored 2. (e.g. 741)
- Score-1': number of users who scored 1. (e.g. 1580)
Acknowledgements
Thanks to:
- MyAnimeList for providing anime data.
- Jikan API for provide users preference.
- Pontificia Universidad Católica de Chile for provide servers to run the code.
Inspiration
-
Have an HTML files to experience the scraping exercise without the delay of each requests.
-
Experiment with different types of recommended. For instance, collaborative filtering or based on context like stats, genre, seiyus, reviews, synopsis, etc.
-
Use this information to build a better anime recommended system.
-
Identifying which feature allows us to build the best anime recommended system.
Ideas to the future
- Build the same dataset with manga and novel.