@Checky 0.2.1 - Highlighted Differences, Better Mention Detection and More !

in #utopian-io6 years ago

checky.png

Click here to get redirected to the GitHub repository of this project.

It's been a while since @checky's last update post. These past two weeks, I've been focusing exclusively on @checky, fixing a few bugs here and there and adding some features that I thought might be useful to some of you. I also put some effort into improving @checky's performances in terms of speed and memory usage. This post is probably the last update post for @checky considering there isn't much left to do as you will see in the "What's coming next ?" section of this post. Before we start looking at changes made to this bot, I want to once again thank you all for the support you gave @checky these past two weeks, whether it was by upvoting, commenting or simply choosing not to flag its comments when it failed at doing its job. Finally before we start, I want to give special thanks to @mineopoly (Absolutely Nothing to Post (Comedy open Mic 33)) and @eoj (@checky Tribute Post) for making me laugh by talking about @checky in their posts ! Ok, now I'm done, let's get started !

Highlighted differences

When @checky made suggestions for lengthy mentions it was sometimes hard to see on first look what the differences between your mentions and the mentions suggested by @checky were. It should now be way easier thanks to differences highlighting. Any added, replaced or transposed character will be written in bold in the suggested mentions so that you can instantly see the differences and correct them faster than ever before. Deleted characters can't be bold though so a solution for that remains to be added to @checky. I have two ideas to show that a character has been deleted. The first one would be to highlight the two characters that surrounded it in the suggested mention (@rageapeanut would get @ragepeanut as a suggested mention), the second one would be to highlight the deleted character in the supposedly wrong mention (@rageapeanut would get @ragepeanut as a suggested mention). If you have any preference, make sure to tell me in the comments so I can see what most steemians prefer. (commit)

Improved mention detection

Since the version 0.1.0 of this bot, special cases that weren't thought of when coding @checky's mention detection started appearing. The one I should have thought of from the very beginning is mentions detected in code sections, it has been fixed by simply ignoring any mention found in code sections, the same applies to quote blocks since the text in them isn't supposed to come from the author (commit 1 - commit 2). Mentions were also detected by the bot in image names (commit) and in links when they were preceded by a hashtag, this has also been fixed. One particular case that I couldn't have possibly thought of when coding @checky was supposedly wrong mentions being found in referenced posts' titles, it has been fixed but may sometimes require the bot to make an additional API call (commit). Up until three days ago, @checky's mention detection was case insensitive (meaning it didn't make any difference between @checky and @ChECky). This was the case because I had realized that a few steemians liked to capitalize parts of the usernames of other steemians they mentioned. However, it ended up being more of a problem than anything because most of the capitalized mentions @checky found were not supposed to be Steem usernames, this is why I decided to make the mention detection case sensitive. Finally, a new app that you may have already heard of got released a few weeks ago, this app is Share2Steem (which aims to do the opposite of Tweem). While I have no problem whatsoever with this app, it being a crossposting app means that mentions in its posts are most likely not Steem usernames, this is why @checky will ignore any post made through this service from now on (commit).

Get some details on the mentions detected by @checky

Perhaps you already got a reply from @checky and you couldn't find where the supposedly wrong mention was in your post. It can quickly be frustrating to know that you made a typo somewhere in your post but to not know where exactly. That's where the !where command comes into play. By simply replying !where to @checky's suggestion comment you will be able to see the 30 words surrounding each supposedly wrong mention of the post. Be aware though that this command is still at its infancy so you may end up getting some weird results, mostly markups getting cut in half. (commit 1 - commit 2 - commit 3)

Suggested tags

After running this bot for a while it has become apparent that more steemians than I had expected confuse the "@" character with the "#" character when talking about tags. In order to make @checky helpful to those users, I had to make a few more checks. If no suggested mention could be generated for a supposedly wrong mention, the bot now tries to see if the mention was actually supposed to be a tag. In order to achieve that, it first looks for the supposedly wrong mention in the post's tags and then, if it doesn't find it, it looks for it in the 1000 trendiest tags. If after all of that it still doesn't find it, it looks at the tags previously used by the author of the post (even though the API call for that seems to be deprecated). (commit 1 - commit 2)

Speed, speed, speed

A major change has been made to @checky's code in order to boost its performances in terms of speed and memory usage: the use of JavaScript sets instead of arrays to store all the edits of supposedly wrong mentions. The old process would create an array for each type of edit (deletes, inserts, transposes and replaces) and make an union of them into one massive array from which correct mentions would be extracted. The new process only requires one set that it passes through the 4 edits functions. Since sets have unique values no matter what, there are less elements at the end of the edits in a set than in an array. Indeed, some edits combinations give the same result (for example a transpose can be equal to a replace in some situations). In order to reduce even more the size of the set, usernames generated by second generation edits need to be validated as correctly structured usernames before being added to the set. The use of a set instead of arrays may not make a big difference for small usernames but it made a huge difference for big usernames that went from getting fully processed in a few minutes with arrays to at most 20 seconds with sets. (commit)

Better stability

A few days ago I came across a post on #steemdev from @fullnodeupdate that intrigued me. @fullnodeupdate is a service/bot created by @holger80 that updates a list of fully working nodes every 3 hours in its account json_metadata. This is great for stability since I have no guarantee that the nodes I'm using for @checky will keep working forever. I'm watching @checky closely for now but there will have to come a time when I will just have to keep it running without looking at its activity every hour to make sure everything is working as expected. In preparation of that moment, I have changed the code to use @fullnodeupdate as a source of nodes and removed the nodes related properties from the config.json file. Just to be extra safe, a fail-safe node (the Steemit one) has been added to the config file. (commit 1 - commit 2)

Other changes

  • Improved the way the bot finds suggestions. It now checks the generated usernames against the Steem API after having checked them against the already encountered usernames. (commit)
  • The bot now checks for the existence of supposedly wrong mentions appended by "the". (commit)
  • A bunch of somewhat useless operations checking has been removed. (commit)
  • Most of the Lodash functions used by the bot have been replaced by more fitting functions written in utils/helper.js. (commit)
  • A test_environment setting and a log_error setting have been added to config.json to avoid committing commented code in the future. (commit 1 - commit 2)
  • A very small risk of concurrent file access on data/users.json has been fixed. (commit)
  • Started using steno to make use of atomic file writing (avoiding corrupted files if the bot unexpectedly crashes). (commit)

What's coming next ?

The only thing I'm sure of is that I will add a way to delete @checky's comments. Something like automatically deleting the comment if the supposedly wrong mentions in a post have been corrected by its author in the 24 hours following @checky's comment. I fear however that it is going to cause @checky's comments to never be interacted with, which would mean no upvotes to help me with the server's cost. If @checky ever becomes expensive to run I will work on generating weekly statistics posts, I hope it won't ever have to come to that though. I am still debating whether or not making an add-on for @checky is a good idea. The biggest problem is that such an add-on would be pretty RAM-heavy due to Peter Norvig's brute-force spelling corrector algorithm and I'm afraid that it could even freeze some steemians' browsers which is obviously not what I want. I also want to start focusing some of my time on a bigger project of mine. To be honest, I'd rather see mention checking being added to the @steem-plus add-on than making a separate add-on just for that. I almost forgot, in a few days I should be launching a logo contest (more like a profile picture contest actually) for @checky. Currently, its profile picture gives a cold vibe, I'd like to make @checky look more friendly.

Contributions

If you want to contribute to this project or talk about an issue it has, feel free to visit its GitHub page. You can also clone it and follow the instructions written there to get it running (although not recommended since @checky already runs the script). My social medias are listed at the end of the README file. If you add me on Steam, tell me the reason why on my wall, otherwise I won't accept your friend request.

Proof of Work Done

https://github.com/RagePeanut/

Sort:  

Thank you for your contribution. I really like what you are doing with @checky, it has come a long way. Again a great contribution with a great write-up about it. One feedback I have is that, you can start using branches in GitHub and for every contribution, you can create a pull request which will help you to track the release.


Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.


Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

Thanks for the review ! Yeah, that's a bad habit of mine that I'm trying to work on. I often just get carried away and forget to create branches which, as you noted, confuses me with the releases indicators and makes the project history harder to read. Will try my best to get rid of that habit !

Thank you for your review, @codingdefined!

So far this week you've reviewed 1 contributions. Keep up the good work!

I have to confess I like @checky. It's not like the grammar nazi. That one frustrates me because I like to "misspell" words and use incorrect grammar to make a point.

@checky focuses on one specific point, it is the mention of usernames. There is a user named @cheeky but hasn't stoped blogging yet. If I mentioned @cheky then would the bot then alert me that no user name exists?

This might be useful when mentioning Steemians in a post. If the writer edits the comment and your original @checky comment disappears then wouldn't it be a good idea to replace it with a new comment saying something like: "Our username bot @checky is honored that you took notice of its correction. To be fair @checky has given you a small upvote for your efforts."

Thanks for the shout out @mineopoly

Thanks for your feedback !

@checky doesn't care if a steemian actually is active or not, it only wants to know if an account that has that username exists. So in the case of mentioning @cheeky, it wouldn't consider it an unexisting username, same for @cheky. But if you write @chekky (which doesn't exist), it would suggest you to write @checky (one edit away: replacing 'k' by 'c') instead of @cheeky (one edit away: replacing 'k' by 'e') or @cheky (one edit away: removing 'k') because it is the most mentioned out of the three usernames. @checky also remembers all the usernames you mentioned since its first active day so that it can refine its suggestions. If you mentioned @cheeky in the past but never @checky, then it would have actually suggested to write @cheeky.

I like the idea of rewarding people for editing their posts but I fear that it may get pretty quickly abused. I guess that @checky could upvote 9 random posts a day (one of them is already used on @fullnodeupdate's posts as a way of thanking the service) out of all the posts commented on that day, the randomness of those votes would hopefully avoid as much as possible abuse of the system. That would mean that, looking at @checky's current stats, if all the posts commented get edited, about 10% of them would get an upvote from @checky. I could also increase the chances of getting an upvote for people that upvoted @checky's comment, may help with covering the costs in the future (even though they currently are ridiculously low).

As for replacing the comments, I'm not so sure I like that idea. I feel like if a user doesn't interact with @checky, it's that it doesn't want to see its comment under his post. I don't want @checky's comments to be treated as "one more bot commenting on my post" by steemians. However, now that I think about it, changing the comment to something like you said (minus the upvote part) would be a good idea if it can't be deleted anyway (upvoted/replied to).

Well, thanks for this comment, it's been a really productive one !

the problem with the most mentioned is that its usually not the one username people are looking for.

May I Suggest 2 options to add....
when retrieving the suggested replacment include the start date of the account you are suggesting that way whether a user is mentioning a whale whos been on the Platform for a couple years or a newbie, that way they implicitly know if it could be the wrong username.

And how about a command to add to return the list of User Names with 1 degree of freedom difference being allowed.

On another note:
I would like to point out is that a majority of people are on Microsoft windows were case is ignored and I always type @RichAtVNS because you can read it just as RJSilver, and PayoutBot.
But all are seen on steem as lowercase versions, so you need to be case insensitive.

Thank you for your feedback and ideas !

About the most mentioned user not being the user people are looking for, that's why @checky first checks its suggestions against steemians previously mentioned by the author and then, if it doesn't find any match, it defaults to suggesting the most mentioned on the blockchain. It will also soon check against the followers/following of the author before the most mentioned to refine even more its suggestions.

I don't really understand your first suggestion so I won't comment on it yet. Your second suggestion though is a really good idea ! I will start working on a !more or !suggestions command to see all of @checky's suggestions for a supposedly wrong mention.

While I see where you are coming from with the bot needing to be case insensitive, it just wasn't worth it, especially with HF20 and the RC cost of comments being pretty high. Most of the mentions (about 80%) it gave suggestions for were not Steem usernames even though there are lots of checks in the code to avoid usernames from other platforms as much as possible. I will however soon add a command to the bot to let users decide whether or not they want @checky to be case insensitive when checking their posts, this should help steemians that like to capitalize Steem usernames.

Hi @ragepeanut, I'm @checky ! While checking the mentions made in this post I noticed that @rageapeanut doesn't exist on Steem. Did you mean to write @ragepeanut ?

If you found this comment useful, consider upvoting it to help keep this bot running. You can see a list of all available commands by replying with !help.

The mention @rageapeanut has been detected in this part of the post:

one would be to highlight the two characters that surrounded it in the suggested mention (@rageapeanut would get @ragepeanut as a suggested mention), the second one would be to highlight the deleted

character in the supposedly wrong mention (@rageapeanut would get @ragepeanut as a suggested mention). If you have any preference, make sure to

If you found this comment useful, consider upvoting it to help keep this bot running. You can see a list of all available commands by replying with !help.

This post has been just added as new item to timeline of Checky on Steem Projects.

If you want to be notified about new updates from this project, register on Steem Projects and add Checky to your favorite projects.

Hi @ragepeanut!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your post is eligible for our upvote, thanks to our collaboration with @utopian-io!
Feel free to join our @steem-ua Discord server

Hi, I really like your content have an upvote.

This post has been upvoted by a voting bot.

Hey, @ragepeanut!

Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

Coin Marketplace

STEEM 0.21
TRX 0.20
JST 0.033
BTC 97309.89
ETH 3288.22
USDT 1.00
SBD 2.99