Using Data To Predict The Outcome Of Football Matches

in #sports7 years ago

soccer.jpeg

Can big data be used to predict the outcome of football matches with accuracy that is better than a random crapshoot? For the past 2 weeks, i've been testing out a model that takes a bunch of statistics related to the squad, form and injuries and tried to generate an output on which side to bet on.

Usually, there are 12-14 matches per week that the model spits out but this week there were only 3! I hope this is due to a one-off scheduling issue and the fact that we're coming off international week.

Modifications to the rules

The model worked well for the English, German, French and Italian leagues. Terrible when it came to the Japanese league. This week, i'll be testing the signal out for the Spanish and Dutch leagues as well. Mainly because there were so few signals.

Predictions

S/NMatchSelectionOdds
1Hoffenheim vs FrankfurtHoffenheim2.03
2Hertha vs B. MonchengladbachB. Monchengladbach2.15
3Udinese vs CagliariUdinese1.85
4Gronigen vs VitesseVitesse2.03
5Espanyol vs ValenciaValencia2.03

Fingers crossed!

Thoughts

Gathering data for this project is either time consuming or expensive. If i get a decent win rate over the next couple of weeks (above 65%), i'll invest some time towards building a scraper and gathering data for a proper Random Forest model.

By the way, have to put a disclaimer here not to go bet on these. These are for building a data model / product eventually.

Sort:  

Thats amazing! How would you fit draws into the equation?
The first match was a draw I think.

Good point and draws are indeed a problem. For betting purposes, putting money on +0 would be a possible solution. The returns are lower though.

For me, because of bookmaker limitations, i look at odds between 1.85 - 2.5 and if the model can get more than 50% predictions right then it should be fine long term.

I see your point there. How has it worked with betting? Are those profits coming in??

This post has received a 14.75 % upvote from @booster thanks to: @numpypython.

This post has been ranked within the top 50 most undervalued posts in the first half of Nov 17. We estimate that this post is undervalued by $6.08 as compared to a scenario in which every voter had an equal say.

See the full rankings and details in The Daily Tribune: Nov 17 - Part I. You can also read about some of our methodology, data analysis and technical details in our initial post.

If you are the author and would prefer not to receive these comments, simply reply "Stop" to this comment.

This post has received a 3.25 % upvote from @buildawhale thanks to: @numpypython. Send at least 1 SBD to @buildawhale with a post link in the memo field for a portion of the next vote.

To support our daily curation initiative, please vote on my owner, @themarkymark, as a Steem Witness

Coin Marketplace

STEEM 0.18
TRX 0.16
JST 0.030
BTC 62572.49
ETH 2444.62
USDT 1.00
SBD 2.67