Baselight

A 6-figure Prize By Soccer Prediction

Clean, yet rich dataset of 7300 soccer matches

@kaggle.analystmasters_earn_your_6_figure_prize

About this Dataset

A 6-figure Prize By Soccer Prediction

Live Feed

Please comment on the Discussion above "Live Feed" and I will share details with you in a message. Hope we won't exceed server limitations.

Context

I have been recording available different types of data on soccer matches since 2012, live 24/7. The whole database contains more than 350,000 soccer matches held all around the world from over 27,000 teams of more than 180 countries. An all-in-one package including servers, algorithms and its database are now under the "Analyst Masters" research platform. The app is also free for everyone to get its predictions on Android Play Store . How could it become useful for a data scientist?

Did you know that,

  • more than 1000 soccer matches are played in a week?
  • the average profit of the stock market from its beginning to now has been less than 10% a year? but you can earn at least 10% on a single match in 2 hours and get your profit in cash

It is one of the very rare datasets that you do not need to prove to other companies your method is the most accurate one and get the prize :) . On the other hand you do not have to classify every data point to be rewarded. Just tune or focus to correctly classify only 1% of matches and there you go! Let me give you a simple hint how easily it can become a classification problem rather than a time series prediction:

Example 1: Who wins based on the number of wins in a head 2 head history?

Q) Consider two teams Midtjylland and Randers from Denmark. They have played against each other for very long time. Midtjyland has won Randers over 8 times in the past 10 matches in a 4 year time span. Forget any other complicated algorithm and simply predict who wins this match?

A) That is easy! However, I am also gathering a lot more information than just their history. You can check their head-to-head history and the odds you could get for predicting this match is "1.73" check here.

Example 2: Number of Goals based on their history?

Q) Consider two teams "San Martin S.J." and "Rosario Central" from Argentina. Their odds for wining "Team 1 (Home)", "Draw" and "Team 2 (away)" is [3.16, 3.2, 2.25] respectively. They rank 22 and 13 in their league. They have recently won 45%,35% of their matches in their past 14 matches. Their average head to head goals in their last 7 matches were 1.3 full time (F) and 0.3 until half-time (HT). How many goals do you think they score in their match? (Note that a safe side of number of goals in soccer betting is Over 0.5 goals in HT, Under 1.5 goals in HT, Over 1.5 goals in FT and Under 3.5 goals in FT). Which one do you choose?

A) For sure under 1.5 goals in HT (you get 35%) and under 3.5 goals in FT (you get 30%) . Bingo you get 65% in a single match in 2 hours

Example 3: Based on the money placed for betting on teams who wins the match?

Q) "Memmingen" and "Munich 1860" are well known in Germany. One of our reliable sources of data is the ratio of money placed on betting from 10 hours before the match until it starts. Assume that the ratio of bets on "Munich 1860" to "Memmingen" are recorded every hour as below, which team do you think will win?

[bets in $ on Munich 1860]/[bets in $ on Memmingen] : {1.01, 1.02, 1.04, 1.1, 1.2, 1.4, 1.58, 2.3, 2.6, 2.8}

A) in 10 hours the amount of money placed on wining Munich 1860 Vs Memmingen increased from 1.01 to 2.8, who is the winner? Easy again, Munich 1860 that gives you 160% as stated here.

Try the dataset and inspect every strategy you may come up with, as I gave you three reliable examples above. Just perform well enough to predict 15 matches correctly in a row, start with $1000 and you are a millionaire. If you can't be that accurate use the Kelly Criterion to divide your whole money into smaller stakes.

Let me do the math for you, if you can only get 90% accuracy on 1% of data points (10 out of 1000 matches a week) and your average profit on each match is only 20%. You earn (9*20% = 180%) and lose 100% for your error in 10 predictions. Your net profit would be 80% in a week or approximately 12% in a day. if you risk only 33% of your whole money on each match then the daily net profit becomes 4%. I guess you can easily calculate how fast you can progress @ 4% daily accumulative profit.

For sure one needs a live data feed to predict the outcome before the match. If everything goes well and enough users are interested I will open the live feed of data for you in a shared folder of Dropbox saved in CSV.

Content

Here is what the dataset contains for 'n' matches:

  • names6.csv
    : team names as the order of "home-away" separated using "/" ; size : (n x 1)

  • results6.csv*
    : Scores recorded during the match, every 2 rows show scores for one match ; size : (2n x 14)

  • fresults6.csv*
    : Final scores after full-time ; size : (n x 2)

  • odds6.csv
    : odds in the order of: Home-Draw-Away ; size : (n x 3)

  • dollars6.csv*
    : Ratio of the money spent on teams at 15 minutes intervals ; size : (n x 76)

  • ranks6.csv
    : their ranks in the league at the day of the match Irrespectively ; size : (n x 2)

  • winrate6.csv
    : their winrate In the last (maximum 14) matches in 2017 Irrespectively ; size : (n x 2)

  • country6.csv
    : their country as some countries are difficult to analyze e.g. Belarussia ; size : (n x 1)

  • wins6.csv
    : number of wins in their last (maximum 6) head to head matches ; size : (n x 1)

  • FT_HT6.csv
    : average of total goals in their last (maximum 6) head to head matches FT and HT ; size : (n x 2)

*. recorded every 15 minutes

Try to predict the match as examples above using the given data in the zip file. I will upload the respective data from Mid of August to Mid September later on.

Acknowledgements

I developed various data scrappers and classifiers running on multiple servers worldwide and never published a paper due to their sensitivity. You may refer to this database by mentioning the "Analyst Masters" research package.

Inspiration

10 years ago I invented the world's first home-size cooking robot in my father's basement but in the end after cooking for us for 2 years it ended up in nothing. So, you as a data scientist can earn money using this live data stream for yourself if you can perform accurately without outperforming others in the competition just get an acceptable accuracy and you are good to go :)

For more information on the overall platform and its live, pre-match and in-play analysis read at www.analystmasters.com or download the [app] 2 for FREE to get easy predictions at 5% profit per week. More details on how the app operates is available at https://youtu.be/fqlu0YEyqc0

Share link

Anyone who has the link will be able to view this.