Cricket Data From 2000 To 2020 For South Africa
Cricket matches, players, innings, bat, bowl, venues, etc
@kaggle.bizzyvinci_south_africa_cricket_data_from_2000_to_2020
Cricket matches, players, innings, bat, bowl, venues, etc
@kaggle.bizzyvinci_south_africa_cricket_data_from_2000_to_2020
I was looking for something challenging and I stumble upon a journal titled 'INCREASED PREDICTION ACCURACY IN THE GAME OF
CRICKET USING MACHINE LEARNING' and decided to implement the project. The first aspect is understanding the journal and cricket 😁.
The next aspect is getting the data. I wrote scripts to scrap espncricinfo. The scripts and journal can be found on github.
The data contains matches played by South Africa from 2000 to 2020 and can be used for prediction as well as exploratory analysis. It consists of 6 tables:
Note: Nan values are represented as -99. Also, match is commonly regarded as mat.
CREATE TABLE bat (
"player" BIGINT,
"mat" BIGINT,
"runs" BIGINT,
"ball" BIGINT,
"m" BIGINT,
"n__4s" BIGINT -- 4s,
"n__6s" BIGINT -- 6s,
"strike_rate" DOUBLE
);CREATE TABLE bowl (
"player" BIGINT,
"mat" BIGINT,
"overs" DOUBLE,
"m" BIGINT,
"runs" BIGINT,
"wicket" BIGINT,
"econ" DOUBLE,
"n__0s" BIGINT -- 0s,
"n__4s" BIGINT -- 4s,
"n__6s" BIGINT -- 6s,
"wide" BIGINT,
"no_ball" BIGINT
);CREATE TABLE ground (
"ground_id" BIGINT,
"ground_name" VARCHAR,
"country" VARCHAR
);CREATE TABLE mat (
"match_id" BIGINT,
"odi_no" BIGINT,
"opposition" BIGINT,
"ground" BIGINT,
"match_date" TIMESTAMP,
"toss" VARCHAR,
"series" VARCHAR,
"result" VARCHAR,
"match_days" VARCHAR
);CREATE TABLE opposition (
"opp_id" BIGINT,
"opp_name" VARCHAR,
"rating" BIGINT
);CREATE TABLE player (
"player_id" BIGINT,
"name" VARCHAR,
"odi_debut" TIMESTAMP,
"playing_role" VARCHAR,
"batting_style" VARCHAR,
"bowling_style" VARCHAR,
"fielding_position" VARCHAR
);Anyone who has the link will be able to view this.