After updating my FPL bot for Reddit, I've started working on adding new features to it. In my previous post I mentioned I wanted to add some sort of command that can be used by Reddit users to call the bot and share some information, e.g. a player's performance against a particular team. And that's exactly what I've been working on in the last week!
Getting the data
A player's performance can be defined in multiple ways, like how many goals and assists they score, or how many clean sheets they have kept (for defensive players). Another way that is becoming very popular is defining a player's performance by their expected performance, for example how many goals they were expected to score (xG) or assist (xA) in a particular match. Normally this data is quite expensive, since it is used in all kinds of analysis, but luckily I found a website called Understat where this is all readily available, for free.
Unfortunately, after emailing them, I found out that they don't have an API, and so I started searching through all the nooks and crannies of their website to find a possible source of the data. Thankfully it didn't take me too long to find out that there is a
Scraping with BeautifulSoup
The function used to scrape and parse the
Once I found this, I set out to use BeautifulSoup to scrape the page and get the content of the variable
playersData. Basically, the way I did this was by iterating over each
<script> in the page, and then try to match the regular expression
r"var\s+playersData\s+=\s+JSON.parse\(\'(.*?)\'\);" with its content. If it matches, then you can simply get the first matching group to get the information needed. It took quite a while to get right but thankfully with the help of @espoem I was able to figure it out.
Unfortunately all was not done. I now had the data I needed, but it still wasn't in a readable or useful format. Simply trying to parse it with
json.loads() wasn't working, amongst many other options I tried. In the end I managed to find a StackOverflow question where someone used the
codecs module to do something similar, so I tried it out, and thankfully it worked!
Part of the player name and team mapping
All that was left was manually changing some of the data, e.g. a player's name and team, to match that of their FPL counterparts. For example, instead of Tottenham, the Fantasy Premier League uses Spurs. This was pretty tedious work, as I had to check which names were too different, then manually create a mapping to fix this. Obviously the same was done for the teams, but that was less work, fortunately.
A player's performance
The function used to get each player's individual data
The information in the
playersData variable was however not enough. It contains each player's xG, xA etc. for the entire season, but doesn't distinguish between opponents. Thankfully each player also has their own page (
https://understat.com/player/<player_id>), and so I used the lessons learned when I made
fpl asynchronous to retrieve all this information.
This function changed a lot since its creation, because of a couple of reasons, but ended up being quite similar to the function used to get the general players data. It also uses a regex to find the variable, and
codecs to load the JSON. However, because all the players are being requested at the same time, sometimes it happened that something like
<closed.php> was returned. I didn't really understand what it was, but I guessed it was because of a hidden rate limit, and so I simply added a
try except. Basically, it would try to match the regex, and if it didn't, then it would simply call itself recursively. This way it's guaranteed to find the variable (eventually), and it seems to work as expected.
Saving the data
Now I had the data, I needed to save it in the database (only what was needed). This also went through multiple iterations, but in the end I decided to do the following: create text indexes, and then simply use MongoDB's
find() to try and find the correct player. I can't really put into words how irritating implementing this part of the bot was, as things seemingly went wrong for no reason, but eventually I got it right (I think).
The reason for doing it this way is that the base database is first populated with data from the Fantasy Premier League website, and as mentioned before, the names differ with those on Understat's website. To manually create a mapping for all 600 players would take way too long, and so I created text indexes for each player's
team. It then uses MongoDB's
find() to get a list of players that maybe match the given name (and team), sorts them by score and then gets the first player in line. This should be the correct player, and so then their Understat attributes are simply added to the document!
Updating the bot
After implementing a way to get the data and save it to a database, it was time to work on the Reddit bot part of things. The way I imagined it working was the following: a person can leave a comment on a subreddit with the command
!fplbot sergio aguero vs. chelsea for example. I also wanted them to be able to limit the number of games, by using the command like this for example:
!fplbot aguero vs. chelsea 2 (which woul limit it to the most recent two matches). The bot would then create a Markdown table containing all the necessary information and reply to the comment. Pretty simple, right?
Creating the regex
I also used a regex to see if someone used something similar to the command, and once again @espoem helped me a lot with this. Basically it needed to do the following: get a player's name (up to 2 separate words, including letters with an umlaut, hyphens (Korean names) etc.) and the opponent (just a team for now), which can obviously only contain letters and spaces.
Creating the table
With the regex complete (not really, it went through multiple iterations and changed a lot over the course of developing this feature) a function was needed that could create the Markdown table. This is obviously very similar to creating the Markdown table for the price changes, so a similar logic was used here. Before creating the table, the function obviously needs to find the player in the database, and figure out which team is actually meant as the opponent.
Function used for handling the comment's content
For finding the player the aforementioned text indexes were once again used. For finding the relevant fixtures a lot of data cleaning was done beforehand to make sure that the team names used in the Fantasy Premier League could be used consistently. Mapping certain names to their actual names was done manually for sake of usability. Basically, I went on Wikipedia, found out every team's nickname, short name etc. and then created a dictionary that could be used to map e.g. "The Cherries" to Bournemouth. So most of the work was actually done by preparing the data!
Creating the Markdown table
Once the function has determined which fixtures are relevant (given by the opponent's name and match limit), they are passed to another function that simply creates the Markdown table and returns this.
Replying to the comment
And now for the easy part. The bot basically streams all comments made to the subreddit, checks if part of the content of the comment matches the regex, and if it does, replies to it and saves the comment's ID to the database, if it's not already in there.
Saving and checking the comment
Since I was already using MongoDB I decided to store the comment IDs in a collection here as well. Also, checking if a comment has already been replied to is remarkably easy.
Streaming and handling the comments
Actually streaming the comments is also very easy, since this is all done by PRAW. And as you expect, simply checking the comment's content if it matches a regex is also easy. As I mentioned before, most of the difficulty of implementing this came from make sure the data can be matched - for the teams I simply created a list of possible teams (names used in Fantasy Premier League) and check that. I've tested it on my own personal subreddit and it seems to be working great!
The bot replying
(The column names probably seem foreign for people who have never heard of Understat before, haha. I'll add some information about this to the README in the future.)
Before I set the bot loose on the FantasyPL subreddit I want to add a similar command, but for comparing the performance of player A vs. the performance of player B. With all the preparation done in this update this shouldn't be too difficult to implement!
Usage & installation
FPLbot uses MongoDB to store players in a database, and so it is required to have MongoDB installed. Other than that, it uses fpl to retrieve information from Fantasy Premier League's API, and thus requires Python 3.6+.
git clone email@example.com:amosbastian/FPLbot.git cd FPLbot pip install -r requirements.txt
Once installed you should create a
config.json file like the above example, but with the correct values. Once you have done this, then you can schedule a cron job to run the bot whenever you want!
I've created a Discord server for people interested in programming FPL related things, so if you are interested in helping out, or simply want to know more, then don't hesitate to join! Otherwise you can simply create an issue on GitHub.