Soccer Predictions using Python (part 2)

in #programming7 years ago

In my previous article we scraped some results data to a .CSV file, now we can see if we can make some predictions.

First, we'll add a couple of new imports.  Most importantly, the Numpy library which will provide our Poisson distribution.

I've left out the scrapeseason() function here to keep the post a bit shorter. 

import pandas as pd
from bs4 import BeautifulSoup as bs
from selenium import webdriver
import datetime
from os import path
import numpy as np

def scrapeseason(country, comp, season):
   ...

def poissonpredict(df, gamedate):
   # set the amount of simulations to run on each game
   simulatedgames = 100000

   # only use games before the date we want to predict
   historical = df.loc[df["date"] < str(gamedate)]

   # make sure we only use games that have valid scores
   historical = historical.loc[df["homeScore"] > -1]

   # games to predict
   topredict = df.loc[df["date"] == str(gamedate)]

   # get average home and away scores for entire competition
   homeAvg = historical["homeScore"].mean()
   awayAvg = historical["awayScore"].mean()

   # loop through the games we want to predict
   for i in topredict.index:
       ht = topredict.ix[i, "homeTeam"]
       at = topredict.ix[i, "awayTeam"]

       # get average goals scored and conceded for home team
       homeTeamHomeAvgFor = historical.loc[df["homeTeam"] == ht, "homeScore"].mean()
       homeTeamHomeAvgAgainst = historical.loc[df["homeTeam"] == ht, "awayScore"].mean()

       # divide averages for team by averages for competition to get attack and defence strengths
       homeTeamAttackStrength = homeTeamHomeAvgFor/homeAvg
       homeTeamDefenceStrength = homeTeamHomeAvgAgainst/awayAvg

       # repeat for away team
       awayTeamAwayAvgFor = historical.loc[df["awayTeam"] == at, "awayScore"].mean()
       awayTeamAwayAvgAgainst = historical.loc[df["awayTeam"] == at, "homeScore"].mean()
       awayTeamAttackStrength = awayTeamAwayAvgFor/awayAvg
       awayTeamDefenceStrength = awayTeamAwayAvgAgainst/homeAvg

       # calculated expected goals using attackstrength * defencestrength * average
       homeTeamExpectedGoals = homeTeamAttackStrength * awayTeamDefenceStrength * homeAvg
       awayTeamExpectedGoals = awayTeamAttackStrength * homeTeamDefenceStrength * awayAvg

       # use numpy's poisson distribution to simulate 100000 games between the two teams
       homeTeamPoisson = np.random.poisson(homeTeamExpectedGoals, simulatedgames)
       awayTeamPoisson = np.random.poisson(awayTeamExpectedGoals, simulatedgames)

       # we can now infer some predictions from our simulated games
       # using numpy to count the results and converting to percentage probability
       homeTeamWins = np.sum(homeTeamPoisson > awayTeamPoisson) / simulatedgames * 100
       draws = np.sum(homeTeamPoisson == awayTeamPoisson) / simulatedgames * 100
       awayTeamWins = np.sum(homeTeamPoisson < awayTeamPoisson) / simulatedgames * 100

       # store our prediction into the dataframe
       df.ix[i, "homeWinProbability"] = homeTeamWins
       df.ix[i, "draws"] = draws
       df.ix[i, "awayTeamWins"] = awayTeamWins

   return df


if not path.isfile("data.csv"):
   # set which country and competition we want to use
   # others to try, "Scotland" & "Premiership" or "Europe" & "UEFA Champions League"
   country = "England"
   competition = "Premier League"
   lastseason = 2016
   thisseason = 2017

   lastseasondata = scrapeseason(country, competition, lastseason)
   thisseasondata = scrapeseason(country, competition, thisseason)

   # combine our data to one frame
   data = pd.concat([lastseasondata, thisseasondata])
   data.reset_index(inplace=True, drop=True)

   # save to file so we don't need to scrape multiple times
   data.to_csv("data.csv")
else:
   # load our csv
   data = pd.read_csv("data.csv", index_col=0, parse_dates=True)

gamedate = datetime.date.today()
data = poissonpredict(data, gamedate)

data.to_csv("data.csv")

As you can see, I've added in a check to see if our data already exists and load it rather than scraping it again.  Also, I have to reiterate that I'm not an expert programmer, so whilst my code may be inelegant, I think it's pretty straightforward.

I've only calculated probabilities for Home Win, Draw and Away Win here but it should be reasonably easy to add other predictions such as Total Goals, Over/Under, Both To Score etc.

So the predictions for todays games?  Here they are.

  • Crystal Palace v Southampton Away
  • Huddersfield Town v Leicester City Home
  • Liverpool v Burnley Home
  • Newcastle United v Stoke City Home
  • Tottenham Hotspur v Swansea City Home
  • Watford v Manchester City Away
  • West Bromwich Albion v West Ham Home

This was a bit rushed as I wanted to get some predictions before todays games started - I'll improve the code and add extra functionality in the next article.  Any comments, tips etc are welcome.  I'm off to the bookies now. :-)

Coin Marketplace

STEEM 0.19
TRX 0.14
JST 0.029
BTC 63782.14
ETH 3146.14
USDT 1.00
SBD 2.55