Soccer Predictions using Python (part 2)
In my previous article we scraped some results data to a .CSV file, now we can see if we can make some predictions.
First, we'll add a couple of new imports. Most importantly, the Numpy library which will provide our Poisson distribution.
I've left out the scrapeseason() function here to keep the post a bit shorter.
import pandas as pd
from bs4 import BeautifulSoup as bs
from selenium import webdriver
import datetime
from os import path
import numpy as np
def scrapeseason(country, comp, season):
...
def poissonpredict(df, gamedate):
# set the amount of simulations to run on each game
simulatedgames = 100000
# only use games before the date we want to predict
historical = df.loc[df["date"] < str(gamedate)]
# make sure we only use games that have valid scores
historical = historical.loc[df["homeScore"] > -1]
# games to predict
topredict = df.loc[df["date"] == str(gamedate)]
# get average home and away scores for entire competition
homeAvg = historical["homeScore"].mean()
awayAvg = historical["awayScore"].mean()
# loop through the games we want to predict
for i in topredict.index:
ht = topredict.ix[i, "homeTeam"]
at = topredict.ix[i, "awayTeam"]
# get average goals scored and conceded for home team
homeTeamHomeAvgFor = historical.loc[df["homeTeam"] == ht, "homeScore"].mean()
homeTeamHomeAvgAgainst = historical.loc[df["homeTeam"] == ht, "awayScore"].mean()
# divide averages for team by averages for competition to get attack and defence strengths
homeTeamAttackStrength = homeTeamHomeAvgFor/homeAvg
homeTeamDefenceStrength = homeTeamHomeAvgAgainst/awayAvg
# repeat for away team
awayTeamAwayAvgFor = historical.loc[df["awayTeam"] == at, "awayScore"].mean()
awayTeamAwayAvgAgainst = historical.loc[df["awayTeam"] == at, "homeScore"].mean()
awayTeamAttackStrength = awayTeamAwayAvgFor/awayAvg
awayTeamDefenceStrength = awayTeamAwayAvgAgainst/homeAvg
# calculated expected goals using attackstrength * defencestrength * average
homeTeamExpectedGoals = homeTeamAttackStrength * awayTeamDefenceStrength * homeAvg
awayTeamExpectedGoals = awayTeamAttackStrength * homeTeamDefenceStrength * awayAvg
# use numpy's poisson distribution to simulate 100000 games between the two teams
homeTeamPoisson = np.random.poisson(homeTeamExpectedGoals, simulatedgames)
awayTeamPoisson = np.random.poisson(awayTeamExpectedGoals, simulatedgames)
# we can now infer some predictions from our simulated games
# using numpy to count the results and converting to percentage probability
homeTeamWins = np.sum(homeTeamPoisson > awayTeamPoisson) / simulatedgames * 100
draws = np.sum(homeTeamPoisson == awayTeamPoisson) / simulatedgames * 100
awayTeamWins = np.sum(homeTeamPoisson < awayTeamPoisson) / simulatedgames * 100
# store our prediction into the dataframe
df.ix[i, "homeWinProbability"] = homeTeamWins
df.ix[i, "draws"] = draws
df.ix[i, "awayTeamWins"] = awayTeamWins
return df
if not path.isfile("data.csv"):
# set which country and competition we want to use
# others to try, "Scotland" & "Premiership" or "Europe" & "UEFA Champions League"
country = "England"
competition = "Premier League"
lastseason = 2016
thisseason = 2017
lastseasondata = scrapeseason(country, competition, lastseason)
thisseasondata = scrapeseason(country, competition, thisseason)
# combine our data to one frame
data = pd.concat([lastseasondata, thisseasondata])
data.reset_index(inplace=True, drop=True)
# save to file so we don't need to scrape multiple times
data.to_csv("data.csv")
else:
# load our csv
data = pd.read_csv("data.csv", index_col=0, parse_dates=True)
gamedate = datetime.date.today()
data = poissonpredict(data, gamedate)
data.to_csv("data.csv")
As you can see, I've added in a check to see if our data already exists and load it rather than scraping it again. Also, I have to reiterate that I'm not an expert programmer, so whilst my code may be inelegant, I think it's pretty straightforward.
I've only calculated probabilities for Home Win, Draw and Away Win here but it should be reasonably easy to add other predictions such as Total Goals, Over/Under, Both To Score etc.
So the predictions for todays games? Here they are.
- Crystal Palace v Southampton Away
- Huddersfield Town v Leicester City Home
- Liverpool v Burnley Home
- Newcastle United v Stoke City Home
- Tottenham Hotspur v Swansea City Home
- Watford v Manchester City Away
- West Bromwich Albion v West Ham Home
This was a bit rushed as I wanted to get some predictions before todays games started - I'll improve the code and add extra functionality in the next article. Any comments, tips etc are welcome. I'm off to the bookies now. :-)