Adding new features to my FPL bot for Reddit

in #fpl5 years ago






https://github.com/amosbastian/FPLbot

After updating my FPL bot for Reddit, I've started working on adding new features to it. In my previous post I mentioned I wanted to add some sort of command that can be used by Reddit users to call the bot and share some information, e.g. a player's performance against a particular team. And that's exactly what I've been working on in the last week!

https://github.com/amosbastian/FPLbot/pull/2

Getting the data

A player's performance can be defined in multiple ways, like how many goals and assists they score, or how many clean sheets they have kept (for defensive players). Another way that is becoming very popular is defining a player's performance by their expected performance, for example how many goals they were expected to score (xG) or assist (xA) in a particular match. Normally this data is quite expensive, since it is used in all kinds of analysis, but luckily I found a website called Understat where this is all readily available, for free.




The playersData variable

Unfortunately, after emailing them, I found out that they don't have an API, and so I started searching through all the nooks and crannies of their website to find a possible source of the data. Thankfully it didn't take me too long to find out that there is a <script> containing a JavaScript variable with all the player's data, as you can see above.

Scraping with BeautifulSoup




The function used to scrape and parse the playersData variable

Once I found this, I set out to use BeautifulSoup to scrape the page and get the content of the variable playersData. Basically, the way I did this was by iterating over each <script> in the page, and then try to match the regular expression r"var\s+playersData\s+=\s+JSON.parse\(\'(.*?)\'\);" with its content. If it matches, then you can simply get the first matching group to get the information needed. It took quite a while to get right but thankfully with the help of @espoem I was able to figure it out.

Unfortunately all was not done. I now had the data I needed, but it still wasn't in a readable or useful format. Simply trying to parse it with json.loads() wasn't working, amongst many other options I tried. In the end I managed to find a StackOverflow question where someone used the codecs module to do something similar, so I tried it out, and thankfully it worked!




Part of the player name and team mapping

All that was left was manually changing some of the data, e.g. a player's name and team, to match that of their FPL counterparts. For example, instead of Tottenham, the Fantasy Premier League uses Spurs. This was pretty tedious work, as I had to check which names were too different, then manually create a mapping to fix this. Obviously the same was done for the teams, but that was less work, fortunately.

A player's performance




The function used to get each player's individual data

The information in the playersData variable was however not enough. It contains each player's xG, xA etc. for the entire season, but doesn't distinguish between opponents. Thankfully each player also has their own page (https://understat.com/player/<player_id>), and so I used the lessons learned when I made fpl asynchronous to retrieve all this information.

This function changed a lot since its creation, because of a couple of reasons, but ended up being quite similar to the function used to get the general players data. It also uses a regex to find the variable, and codecs to load the JSON. However, because all the players are being requested at the same time, sometimes it happened that something like <closed.php> was returned. I didn't really understand what it was, but I guessed it was because of a hidden rate limit, and so I simply added a try except. Basically, it would try to match the regex, and if it didn't, then it would simply call itself recursively. This way it's guaranteed to find the variable (eventually), and it seems to work as expected.

Saving the data




The new update_players() function

Now I had the data, I needed to save it in the database (only what was needed). This also went through multiple iterations, but in the end I decided to do the following: create text indexes, and then simply use MongoDB's find() to try and find the correct player. I can't really put into words how irritating implementing this part of the bot was, as things seemingly went wrong for no reason, but eventually I got it right (I think).

The reason for doing it this way is that the base database is first populated with data from the Fantasy Premier League website, and as mentioned before, the names differ with those on Understat's website. To manually create a mapping for all 600 players would take way too long, and so I created text indexes for each player's web_name, first_name, second_name and team. It then uses MongoDB's find() to get a list of players that maybe match the given name (and team), sorts them by score and then gets the first player in line. This should be the correct player, and so then their Understat attributes are simply added to the document!

Updating the bot

After implementing a way to get the data and save it to a database, it was time to work on the Reddit bot part of things. The way I imagined it working was the following: a person can leave a comment on a subreddit with the command !fplbot sergio aguero vs. chelsea for example. I also wanted them to be able to limit the number of games, by using the command like this for example: !fplbot aguero vs. chelsea 2 (which woul limit it to the most recent two matches). The bot would then create a Markdown table containing all the necessary information and reply to the comment. Pretty simple, right?

Creating the regex

I also used a regex to see if someone used something similar to the command, and once again @espoem helped me a lot with this. Basically it needed to do the following: get a player's name (up to 2 separate words, including letters with an umlaut, hyphens (Korean names) etc.) and the opponent (just a team for now), which can obviously only contain letters and spaces.

r"!fplbot\s+([^\W\d]+(?:[\s-][^\W\d]+)*)\s+(?:vs.|vs)\s+([a-zA-Z ]+)(\d+)?"

Creating the table

With the regex complete (not really, it went through multiple iterations and changed a lot over the course of developing this feature) a function was needed that could create the Markdown table. This is obviously very similar to creating the Markdown table for the price changes, so a similar logic was used here. Before creating the table, the function obviously needs to find the player in the database, and figure out which team is actually meant as the opponent.




Function used for handling the comment's content

For finding the player the aforementioned text indexes were once again used. For finding the relevant fixtures a lot of data cleaning was done beforehand to make sure that the team names used in the Fantasy Premier League could be used consistently. Mapping certain names to their actual names was done manually for sake of usability. Basically, I went on Wikipedia, found out every team's nickname, short name etc. and then created a dictionary that could be used to map e.g. "The Cherries" to Bournemouth. So most of the work was actually done by preparing the data!




Creating the Markdown table

Once the function has determined which fixtures are relevant (given by the opponent's name and match limit), they are passed to another function that simply creates the Markdown table and returns this.

Replying to the comment

And now for the easy part. The bot basically streams all comments made to the subreddit, checks if part of the content of the comment matches the regex, and if it does, replies to it and saves the comment's ID to the database, if it's not already in there.




Saving and checking the comment

Since I was already using MongoDB I decided to store the comment IDs in a collection here as well. Also, checking if a comment has already been replied to is remarkably easy.




Streaming and handling the comments

Actually streaming the comments is also very easy, since this is all done by PRAW. And as you expect, simply checking the comment's content if it matches a regex is also easy. As I mentioned before, most of the difficulty of implementing this came from make sure the data can be matched - for the teams I simply created a list of possible teams (names used in Fantasy Premier League) and check that. I've tested it on my own personal subreddit and it seems to be working great!




The bot replying

(The column names probably seem foreign for people who have never heard of Understat before, haha. I'll add some information about this to the README in the future.)

Roadmap

Before I set the bot loose on the FantasyPL subreddit I want to add a similar command, but for comparing the performance of player A vs. the performance of player B. With all the preparation done in this update this shouldn't be too difficult to implement!

Usage & installation

FPLbot uses MongoDB to store players in a database, and so it is required to have MongoDB installed. Other than that, it uses fpl to retrieve information from Fantasy Premier League's API, and thus requires Python 3.6+.

git clone [email protected]:amosbastian/FPLbot.git
cd FPLbot
pip install -r requirements.txt



Once installed you should create a config.json file like the above example, but with the correct values. Once you have done this, then you can schedule a cron job to run the bot whenever you want!

Contributing



I've created a Discord server for people interested in programming FPL related things, so if you are interested in helping out, or simply want to know more, then don't hesitate to join! Otherwise you can simply create an issue on GitHub.

Sort:  
  • Great article with all of the quality storytelling elements.
  • Good format, images, code samples and explanations.
  • Awesome separation of concerns with your commits and code is nicely commented.

Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.


Need help? Chat with us on Discord.

[utopian-moderator]

Thank you for your review, @helo! Keep up the good work!

Hi, @amosbastian!

You just got a 0.07% upvote from SteemPlus!
To get higher upvotes, earn more SteemPlus Points (SPP). On your Steemit wallet, check your SPP balance and click on "How to earn SPP?" to find out all the ways to earn.
If you're not using SteemPlus yet, please check our last posts in here to see the many ways in which SteemPlus can improve your Steem experience on Steemit and Busy.

Hi @amosbastian!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your post is eligible for our upvote, thanks to our collaboration with @utopian-io!
Feel free to join our @steem-ua Discord server

Congratulations! Your post has been selected as a daily Steemit truffle! It is listed on rank 6 of all contributions awarded today. You can find the TOP DAILY TRUFFLE PICKS HERE.

I upvoted your contribution because to my mind your post is at least 5 SBD worth and should receive 222 votes. It's now up to the lovely Steemit community to make this come true.

I am TrufflePig, an Artificial Intelligence Bot that helps minnows and content curators using Machine Learning. If you are curious how I select content, you can find an explanation here!

Have a nice day and sincerely yours,
trufflepig
TrufflePig

I saw that you implemented age-weighting sorting. Thanks!

Hey, @amosbastian!

Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

Coin Marketplace

STEEM 0.35
TRX 0.12
JST 0.040
BTC 70601.40
ETH 3576.96
USDT 1.00
SBD 4.75