UFC-Fight Historical Data From 1993 To 2021
Compiled UFC fight, fighter stats and information
@kaggle.rajeevw_ufcdata
Compiled UFC fight, fighter stats and information
@kaggle.rajeevw_ufcdata
This dataset got a lot of love from the community and I saw many people asking for an updated version, so I have uploaded the latest scraped and processed data ( as of 21/03/2021). Now it's super easy for anyone to get the latest dataset (Just use a single command), so in case you need bleeding-edge data, or you want to see the code, you can look here. Hope this solves all problems!
If there are any issues with the data, please forgive me and write about it in the comments or raise an issue on github. I will pick it up 👍
Thank you everyone for the emails and messages. As usual, have fun! ❤️ 😁
This is a list of every UFC fight in the history of the organisation. Every row contains information about both fighters, fight details and the winner. The data was scraped from ufcstats website. After fightmetric ceased to exist, this came into picture. I saw that there was a lot of information on the website about every fight and every event and there were no existing ways of capturing all this. I used beautifulsoup to scrape the data and pandas to process it. It was a long and arduous process, please forgive any mistakes. I have provided the raw files incase anybody wants to process it differently. This is my first time creating a dataset, any suggestions and corrections are welcome! Incase anyone wants to check out the work, I have all uploaded all the code files, including the scraping module here
Have fun!
Each row is a compilation of both fighter stats. Fighters are represented by 'red' and 'blue' (for red and blue corner). So for instance, red fighter has the complied average stats of all the fights except the current one. The stats include damage done by the red fighter on the opponent and the damage done by the opponent on the fighter (represented by 'opp' in the columns) in all the fights this particular red fighter has had, except this one as it has not occured yet (in the data). Same information exists for blue fighter. The target variable is 'Winner' which is the only column that tells you what happened.
Here are some column definitions:
R_ and B_ prefix signifies red and blue corner fighter stats respectively_opp_ containing columns is the average of damage done by the opponent on the fighterKD is number of knockdownsSIG_STR is no. of significant strikes 'landed of attempted'SIG_STR_pct is significant strikes percentageTOTAL_STR is total strikes 'landed of attempted'TD is no. of takedownsTD_pct is takedown percentagesSUB_ATT is no. of submission attemptsPASS is no. times the guard was passed?REV is the no. of Reversals landedHEAD is no. of significant strinks to the head 'landed of attempted'BODY is no. of significant strikes to the body 'landed of attempted'CLINCH is no. of significant strikes in the clinch 'landed of attempted'GROUND is no. of significant strikes on the ground 'landed of attempted'win_by is method of winlast_round is last round of the fight (ex. if it was a KO in 1st, then this will be 1)last_round_time is when the fight ended in the last roundFormat is the format of the fight (3 rounds, 5 rounds etc.)Referee is the name of the Refdate is the date of the fightlocation is the location in which the event took placeFight_type is which weight class and whether it's a title bout or notWinner is the winner of the fightStance is the stance of the fighter (orthodox, southpaw, etc.)Height_cms is the height in centimeterReach_cms is the reach of the fighter (arm span) in centimeterWeight_lbs is the weight of the fighter in pounds (lbs)age is the age of the fightertitle_bout Boolean value of whether it is title fight or notweight_class is which weight class the fight is in (Bantamweight, heavyweight, Women's flyweight, etc.)no_of_rounds is the number of rounds the fight was scheduled forcurrent_lose_streak is the count of current concurrent losses of the fightercurrent_win_streak is the count of current concurrent wins of the fighterdraw is the number of draws in the fighter's ufc careerwins is the number of wins in the fighter's ufc careerlosses is the number of losses in the fighter's ufc careertotal_rounds_fought is the average of total rounds fought by the fightertotal_time_fought(seconds) is the count of total time spent fighting in secondstotal_title_bouts is the total number of title bouts taken part in by the fighterwin_by_Decision_Majority is the number of wins by majority judges decision in the fighter's ufc careerwin_by_Decision_Split is the number of wins by split judges decision in the fighter's ufc careerwin_by_Decision_Unanimous is the number of wins by unanimous judges decision in the fighter's ufc careerwin_by_KO/TKO is the number of wins by knockout in the fighter's ufc careerwin_by_Submission is the number of wins by submission in the fighter's ufc careerwin_by_TKO_Doctor_Stoppage is the number of wins by doctor stoppage in the fighter's ufc careerInspiration: https://github.com/Hitkul/UFC_Fight_Prediction
Provided ideas on how to store per fight data. Unfortunately, the entire UFC website and fightmetric website changed so couldn't reuse any of the code.
Print Progress Bar: https://gist.github.com/aubricus/f91fb55dc6ba5557fbab06119420dd6a
To display progress of how much download is complete in the terminal
You can check out who I am and what I do here
CREATE TABLE data (
"r_fighter" VARCHAR,
"b_fighter" VARCHAR,
"referee" VARCHAR,
"date" TIMESTAMP,
"location" VARCHAR,
"winner" VARCHAR,
"title_bout" BOOLEAN,
"weight_class" VARCHAR,
"b_avg_kd" DOUBLE,
"b_avg_opp_kd" DOUBLE,
"b_avg_sig_str_pct" DOUBLE,
"b_avg_opp_sig_str_pct" DOUBLE,
"b_avg_td_pct" DOUBLE,
"b_avg_opp_td_pct" DOUBLE,
"b_avg_sub_att" DOUBLE,
"b_avg_opp_sub_att" DOUBLE,
"b_avg_rev" DOUBLE,
"b_avg_opp_rev" DOUBLE,
"b_avg_sig_str_att" DOUBLE,
"b_avg_sig_str_landed" DOUBLE,
"b_avg_opp_sig_str_att" DOUBLE,
"b_avg_opp_sig_str_landed" DOUBLE,
"b_avg_total_str_att" DOUBLE,
"b_avg_total_str_landed" DOUBLE,
"b_avg_opp_total_str_att" DOUBLE,
"b_avg_opp_total_str_landed" DOUBLE,
"b_avg_td_att" DOUBLE,
"b_avg_td_landed" DOUBLE,
"b_avg_opp_td_att" DOUBLE,
"b_avg_opp_td_landed" DOUBLE,
"b_avg_head_att" DOUBLE,
"b_avg_head_landed" DOUBLE,
"b_avg_opp_head_att" DOUBLE,
"b_avg_opp_head_landed" DOUBLE,
"b_avg_body_att" DOUBLE,
"b_avg_body_landed" DOUBLE,
"b_avg_opp_body_att" DOUBLE,
"b_avg_opp_body_landed" DOUBLE,
"b_avg_leg_att" DOUBLE,
"b_avg_leg_landed" DOUBLE,
"b_avg_opp_leg_att" DOUBLE,
"b_avg_opp_leg_landed" DOUBLE,
"b_avg_distance_att" DOUBLE,
"b_avg_distance_landed" DOUBLE,
"b_avg_opp_distance_att" DOUBLE,
"b_avg_opp_distance_landed" DOUBLE,
"b_avg_clinch_att" DOUBLE,
"b_avg_clinch_landed" DOUBLE,
"b_avg_opp_clinch_att" DOUBLE,
"b_avg_opp_clinch_landed" DOUBLE,
"b_avg_ground_att" DOUBLE,
"b_avg_ground_landed" DOUBLE,
"b_avg_opp_ground_att" DOUBLE,
"b_avg_opp_ground_landed" DOUBLE,
"b_avg_ctrl_time_seconds" DOUBLE -- B Avg CTRL Time(seconds),
"b_avg_opp_ctrl_time_seconds" DOUBLE -- B Avg Opp CTRL Time(seconds),
"b_total_time_fought_seconds" DOUBLE -- B Total Time Fought(seconds),
"b_total_rounds_fought" BIGINT,
"b_total_title_bouts" BIGINT,
"b_current_win_streak" BIGINT,
"b_current_lose_streak" BIGINT,
"b_longest_win_streak" BIGINT,
"b_wins" BIGINT,
"b_losses" BIGINT,
"b_draw" BIGINT,
"b_win_by_decision_majority" BIGINT,
"b_win_by_decision_split" BIGINT,
"b_win_by_decision_unanimous" BIGINT,
"b_win_by_ko_tko" BIGINT,
"b_win_by_submission" BIGINT,
"b_win_by_tko_doctor_stoppage" BIGINT,
"b_stance" VARCHAR,
"b_height_cms" DOUBLE,
"b_reach_cms" DOUBLE,
"b_weight_lbs" DOUBLE,
"r_avg_kd" DOUBLE,
"r_avg_opp_kd" DOUBLE,
"r_avg_sig_str_pct" DOUBLE,
"r_avg_opp_sig_str_pct" DOUBLE,
"r_avg_td_pct" DOUBLE,
"r_avg_opp_td_pct" DOUBLE,
"r_avg_sub_att" DOUBLE,
"r_avg_opp_sub_att" DOUBLE,
"r_avg_rev" DOUBLE,
"r_avg_opp_rev" DOUBLE,
"r_avg_sig_str_att" DOUBLE,
"r_avg_sig_str_landed" DOUBLE,
"r_avg_opp_sig_str_att" DOUBLE,
"r_avg_opp_sig_str_landed" DOUBLE,
"r_avg_total_str_att" DOUBLE,
"r_avg_total_str_landed" DOUBLE,
"r_avg_opp_total_str_att" DOUBLE,
"r_avg_opp_total_str_landed" DOUBLE,
"r_avg_td_att" DOUBLE,
"r_avg_td_landed" DOUBLE,
"r_avg_opp_td_att" DOUBLE,
"r_avg_opp_td_landed" DOUBLE,
"r_avg_head_att" DOUBLE,
"r_avg_head_landed" DOUBLE,
"r_avg_opp_head_att" DOUBLE
);CREATE TABLE preprocessed_data (
"winner" VARCHAR,
"title_bout" BOOLEAN,
"b_avg_kd" DOUBLE,
"b_avg_opp_kd" DOUBLE,
"b_avg_sig_str_pct" DOUBLE,
"b_avg_opp_sig_str_pct" DOUBLE,
"b_avg_td_pct" DOUBLE,
"b_avg_opp_td_pct" DOUBLE,
"b_avg_sub_att" DOUBLE,
"b_avg_opp_sub_att" DOUBLE,
"b_avg_rev" DOUBLE,
"b_avg_opp_rev" DOUBLE,
"b_avg_sig_str_att" DOUBLE,
"b_avg_sig_str_landed" DOUBLE,
"b_avg_opp_sig_str_att" DOUBLE,
"b_avg_opp_sig_str_landed" DOUBLE,
"b_avg_total_str_att" DOUBLE,
"b_avg_total_str_landed" DOUBLE,
"b_avg_opp_total_str_att" DOUBLE,
"b_avg_opp_total_str_landed" DOUBLE,
"b_avg_td_att" DOUBLE,
"b_avg_td_landed" DOUBLE,
"b_avg_opp_td_att" DOUBLE,
"b_avg_opp_td_landed" DOUBLE,
"b_avg_head_att" DOUBLE,
"b_avg_head_landed" DOUBLE,
"b_avg_opp_head_att" DOUBLE,
"b_avg_opp_head_landed" DOUBLE,
"b_avg_body_att" DOUBLE,
"b_avg_body_landed" DOUBLE,
"b_avg_opp_body_att" DOUBLE,
"b_avg_opp_body_landed" DOUBLE,
"b_avg_leg_att" DOUBLE,
"b_avg_leg_landed" DOUBLE,
"b_avg_opp_leg_att" DOUBLE,
"b_avg_opp_leg_landed" DOUBLE,
"b_avg_distance_att" DOUBLE,
"b_avg_distance_landed" DOUBLE,
"b_avg_opp_distance_att" DOUBLE,
"b_avg_opp_distance_landed" DOUBLE,
"b_avg_clinch_att" DOUBLE,
"b_avg_clinch_landed" DOUBLE,
"b_avg_opp_clinch_att" DOUBLE,
"b_avg_opp_clinch_landed" DOUBLE,
"b_avg_ground_att" DOUBLE,
"b_avg_ground_landed" DOUBLE,
"b_avg_opp_ground_att" DOUBLE,
"b_avg_opp_ground_landed" DOUBLE,
"b_avg_ctrl_time_seconds" DOUBLE -- B Avg CTRL Time(seconds),
"b_avg_opp_ctrl_time_seconds" DOUBLE -- B Avg Opp CTRL Time(seconds),
"b_total_time_fought_seconds" DOUBLE -- B Total Time Fought(seconds),
"b_total_rounds_fought" BIGINT,
"b_total_title_bouts" BIGINT,
"b_current_win_streak" BIGINT,
"b_current_lose_streak" BIGINT,
"b_longest_win_streak" BIGINT,
"b_wins" BIGINT,
"b_losses" BIGINT,
"b_draw" BIGINT,
"b_win_by_decision_majority" BIGINT,
"b_win_by_decision_split" BIGINT,
"b_win_by_decision_unanimous" BIGINT,
"b_win_by_ko_tko" BIGINT,
"b_win_by_submission" BIGINT,
"b_win_by_tko_doctor_stoppage" BIGINT,
"b_height_cms" DOUBLE,
"b_reach_cms" DOUBLE,
"b_weight_lbs" DOUBLE,
"r_avg_kd" DOUBLE,
"r_avg_opp_kd" DOUBLE,
"r_avg_sig_str_pct" DOUBLE,
"r_avg_opp_sig_str_pct" DOUBLE,
"r_avg_td_pct" DOUBLE,
"r_avg_opp_td_pct" DOUBLE,
"r_avg_sub_att" DOUBLE,
"r_avg_opp_sub_att" DOUBLE,
"r_avg_rev" DOUBLE,
"r_avg_opp_rev" DOUBLE,
"r_avg_sig_str_att" DOUBLE,
"r_avg_sig_str_landed" DOUBLE,
"r_avg_opp_sig_str_att" DOUBLE,
"r_avg_opp_sig_str_landed" DOUBLE,
"r_avg_total_str_att" DOUBLE,
"r_avg_total_str_landed" DOUBLE,
"r_avg_opp_total_str_att" DOUBLE,
"r_avg_opp_total_str_landed" DOUBLE,
"r_avg_td_att" DOUBLE,
"r_avg_td_landed" DOUBLE,
"r_avg_opp_td_att" DOUBLE,
"r_avg_opp_td_landed" DOUBLE,
"r_avg_head_att" DOUBLE,
"r_avg_head_landed" DOUBLE,
"r_avg_opp_head_att" DOUBLE,
"r_avg_opp_head_landed" DOUBLE,
"r_avg_body_att" DOUBLE,
"r_avg_body_landed" DOUBLE,
"r_avg_opp_body_att" DOUBLE,
"r_avg_opp_body_landed" DOUBLE,
"r_avg_leg_att" DOUBLE,
"r_avg_leg_landed" DOUBLE
);CREATE TABLE raw_fighter_details (
"fighter_name" VARCHAR,
"height" VARCHAR,
"weight" VARCHAR,
"reach" VARCHAR,
"stance" VARCHAR,
"dob" TIMESTAMP,
"slpm" DOUBLE,
"str_acc" VARCHAR,
"sapm" DOUBLE,
"str_def" VARCHAR,
"td_avg" DOUBLE,
"td_acc" VARCHAR,
"td_def" VARCHAR,
"sub_avg" DOUBLE
);CREATE TABLE raw_total_fight_data (
"r_fighter" VARCHAR,
"b_fighter" VARCHAR,
"r_kd" BIGINT,
"b_kd" BIGINT,
"r_sig_str" VARCHAR -- R SIG STR.,
"b_sig_str" VARCHAR -- B SIG STR.,
"r_sig_str_pct" VARCHAR,
"b_sig_str_pct" VARCHAR,
"r_total_str" VARCHAR -- R TOTAL STR.,
"b_total_str" VARCHAR -- B TOTAL STR.,
"r_td" VARCHAR,
"b_td" VARCHAR,
"r_td_pct" VARCHAR,
"b_td_pct" VARCHAR,
"r_sub_att" BIGINT,
"b_sub_att" BIGINT,
"r_rev" BIGINT,
"b_rev" BIGINT,
"r_ctrl" VARCHAR,
"b_ctrl" VARCHAR,
"r_head" VARCHAR,
"b_head" VARCHAR,
"r_body" VARCHAR,
"b_body" VARCHAR,
"r_leg" VARCHAR,
"b_leg" VARCHAR,
"r_distance" VARCHAR,
"b_distance" VARCHAR,
"r_clinch" VARCHAR,
"b_clinch" VARCHAR,
"r_ground" VARCHAR,
"b_ground" VARCHAR,
"win_by" VARCHAR,
"last_round" BIGINT,
"last_round_time" VARCHAR,
"format" VARCHAR,
"referee" VARCHAR,
"date" VARCHAR,
"location" VARCHAR,
"fight_type" VARCHAR,
"winner" VARCHAR
);Anyone who has the link will be able to view this.