@Checky 0.1.0 - Suggestions, Delayed Comments and More !
I would like to start this post by thanking you all for receiving @checky so well. Since its deployment the support for this bot has been overwhelmingly positive. It received thank you messages and upvotes on a lot of its comments and got barely flagged. Some of you even shared some good feedback and suggestions leading to improvements to the bot's code and to its features. In case you missed @checky's introduction, you can read this short post to know what @checky is and why it exists. This post will talk about the changes that have been made to @checky since its introduction. I feel like I can't do that much more to improve this bot's features, I still have one or two ideas in mind but that's about it, those will be talked about at the end of this post. Once I feel like nothing can be added to the bot's features, I'll start working on an add-on as requested by some of you. I've never actually done that before but there is a first to everything, right? Anyway, let's talk about the changes made to @checky since its introduction, starting with the most important one.
Suggesting existing usernames
The biggest thing that @checky's comments were missing was suggesting an existing username close to the one used in a wrong mention. I've tested some correction algorithms to implement that feature and ended up using Peter Norvig's spelling corrector algorithm. This algorithm is more of a brute force approach to spell checking than anything but it's the one that gave the best results while testing. It works by generating a bunch of variations for a wrong username in the hope that one of them will correspond to an existing username. First, it generates all the words that are one edit away from the wrong username. An edit is either a removed character ("useer" becomes "user"), a replaced character ("usar" becomes "user"), an added character ("usr" becomes "user") or two swapped adjacent characters ("usre" becomes "user"). If no known username is found, it generates all the words that are two edits away from the wrong username. If none of them are existing usernames, it gives up and returns
nullwhich results in the regular "maybe you made a mistake" message. In the original algorithm, if more than one word are found, it returns the word that is the most common in the specific language those words are from. In the case of @checky, since it is actually dealing with usernames, it can be more specific and use the following logic. If a found username exists somewhere in the post, it returns that one. Otherwise if one of the found usernames has been mentioned at least once by the author in its other posts (saved in the
mentionedproperty of user objects), it returns that one. If none of those conditions are met, it returns the found username that has been mentioned the most on Steem by all the users combined (saved in the
occurrencesproperty of user objects). Thanks to @nairadaddy for the idea! (commit 1 - commit 2)
Delay @checky's comments !
Some steemians like doing a final check on their post after broadcasting it to make sure that all the links work correctly. It can be frustrating for those steemians to instantly get commented on by a bot checking their mentions, that's why there now is a command to delay @checky's post checking. While it will still check posts instantly by default, typing this command will make @checky wait a specified amount of time (in minutes) before checking your posts. To set a delay, reply to any of @checky's comments with
!wait minutes. For example,
!delay 5would set @checky to wait 5 minutes before checking the mentions in your posts. Thanks to @bashadow for the idea! (commit 1 - commit 2)
Supporting all the Steem apps
When introduced, @checky didn't have any way to know who the users mentioned in a post were. It relied entirely on apps setting the
usersarray in the post's
json_metadatawhich resulted in posts made through some apps simply getting ignored by the bot. While this already was a problem in and of itself, a bigger problem was that metadata coming from another app should never be trusted. One particular app (cough Busy cough) was populating the
usersarray with impossible mentions (too long, ending with dots, containing unauthorized characters, etc...), causing @checky to have to filter out all those impossible mentions. Those two problems shouldn't exist anymore since @checky now searches for mentions in the post body. (commit)
Ignoring some username variations
Some users like to sign their posts with a variation of their username. While it's impossible for @checky to account for all the possible human behaviors, it should try its best to avoid most of the common username variations used as end of post signatures. That's what it now does with usernames containing multiple parts (separated by dashes and dots) or numbers. For example, if your username is @the-best-user15, ending your post by signing with @the-best, @best-user, @thebestuser, @the-best-user, @user15, etc... won't get you a comment from @checky because those are all variations of @the-best-user15. (commit)
The social network checking has been improved !
Before, social network related words were looked for in a range of 600 characters around a wrong mention (300 characters on each side). The range isn't the same anymore, it's now 40 words which is a way better range since it accounts for words length variation. A few social network related words have been added to the list of words to look for, here are the new ones: Facebook, Golos, Discord, Minds, IG, RT, FB, EOS (not a social network but it had to be added), t.me and t.co. The checks are case insensitive, meaning that "Word" is considered the same as "word" and "wOrD". A very specific bug has also been fixed, one that had me scratching my head for a while. Let's say that your username is @username1 but that your Twitter handle is @username. In most cases, @username would be considered a wrong mention until checking for social network related words and seeing that it was a Twitter handle. However, in some cases where @username1 would be mentioned before @username, it would match @username1 since it contains @username. This would result in the mention still being considered a wrong mention even though it clearly is a Twitter handle. This has been fixed by checking that the matched username isn't directly followed by an alphabetic character or by a number. (commit 1 - commit 2)
I've realized pretty quickly that I would have to set @checky to
offmyself for some users because they wouldn't flag nor reply to @checky's comments with
!mode off. Instead of editing the users.json file, I've decided to add a way as a moderator to type commands for other users. Don't worry about any abuse of this feature, it would be pretty stupid for me to try and sabotage my own project. Every moderator command follows this simple pattern:
(for:username) command. For example, if a moderator wanted to set the delay to 5 minutes for an account, he would write
(for:account) !delay 5. In order to be as transparent as possible with the commands I use, I've created a post on @checky's account that will be used for typing moderator commands. Any moderator command will also be followed by a line quickly explaining why the command has been used. (commit)
Ignoring images and popular domain extensions
Any mention ending with a dot followed by png, jpg, jpeg or gif are ignored by @checky. This decision has been made to avoid matching parts of image names when looking for wrong mentions. Just like for images, the following popular domain extensions are ignored by @checky: com, co, io, org, net. This decision has been made to avoid matching sentences such as "you can follow me @website.com" which are quite common. (commit)
Avoiding a few tags
While I'm well aware that these tags will probably only get used by me, I've made the bot ignore posts containing some tags. For a post to be ignored by @checky, it must have a tag starting with #nobot, #no-bot, #nocheck or #no-check. This update post contains the #nochecky tag and should therefore not be checked by @checky. (commit)
Fixed comments not broadcasted due to the 20 seconds limit
What's coming next ?
- Suggesting to use a hashtag instead of an arobase: quite a lot of steemians mix these two characters when talking about contests, @checky should be able to understand if a mention was supposed to be a tag or not.
- Improved suggestions: before checking for level 2 edits, the script should first check against an API if any of the words generated actually exists as a username. Currently it only checks against the already encountered usernames.
- Making use of replies to @checky's comments: some users reply to @checky with the username of the wrongly mentioned user. This shouldn't go to waste and will be used to make the username correction algorithm better.
- Weekly statistics: as requested by some steemians, I'll add automated statistics posts to @checky. I still have to find some good statistics to generate though.
- Add-on: once the work on the bot is over, I'll work on an add-on to check your mentions before posting.
If you want to contribute to this project or talk about an issue it has, feel free to visit its GitHub page. You can also clone it and follow the instructions written there to get it running (although not recommended since @checky already runs the script). My social medias are listed at the end of the README file. If you add me on Steam, tell me the reason why on my wall, otherwise I won't accept your friend request.