Hunting bots with Python (and love)

in #programming8 years ago (edited)

Steemit has a love/hate relationship with bots. There are good bots like @cheetah that try to make steemit more enjoyable for everyone, and evil bots like @rickydevil that downvotes en masse.

NOTE: I DON'T HAVE ANYTHING AGAINST BOTS. THIS WAS JUST A FUN, INTELLECTUAL CHALLENGE!

In my previous post I presented some graphs that showed the hourly distribution of upvoting by whales for each day of the week.

The graphs (example above) showing the known bots were clearly different from regular users, so I wondered: is it possible to identify bots using Python?

Needle in a haystack?

Instead of looking at upvote distribution for a single user over a long time period, which is dreadfully slow, I compared the sum of all upvotes by all users using the median absolute deviation to find outliers. This can be run against much smaller datasets, making it much faster.

Running this for all blocks in the fifteen minutes (approximately 300 blocks) I got the following results:

Outliers (candidate bots) are shown in red. The top graph represents the outliers based on a mean absolute deviation that includes all samples; the bottom graph is based on a mean absolute deviation that excludes all votes under the mean. This removes low-values outliers improving the results slightly.

There were 886 upvotes in by 351 users; the data is skewed to the right. This is what you'd expect if most users are human - they read some articles and up-vote, then get on with other things.

So how many bots did it find? In total 11 (potential) bots were found including upvotes under the mean; only two without:

Getting stats from block 3989048 to block 3989348

             count   mad  mad2
voter                         
activcat        30  14.0  13.0
boy             13   5.5   4.5
bue             13   5.5   4.5
bue-witness     13   5.5   4.5
bunny           13   5.5   4.5
daniel.pan      13   5.5   4.5
healthcare      13   5.5   4.5
helen.tan       14   6.0   5.0
mini            13   5.5   4.5
moon            14   6.0   5.0
vukasin         41  19.5  18.5
11 (potential) bots found

          count   mad  mad2
voter                      
activcat     30  14.0  13.0
vukasin      41  19.5  18.5
2 (potential) bots found

Are they really bots?

Verification isn't easy. Some users upvote often by hand to try and get as many curation rewards as possible and these lie on the cut-off point. Looking at their activity on steemd.com they all have similar profiles that are dominated by upvotes:

Perhaps the most interesting result is what happens when you look at the distribution graphs for these users using the code from the previous post.

The two users found by removing the upvote counts below the mean don't behave like bots at all:

But all the users found including users with upvotes below the mean (excluding the two users above, which were also in this set) look an awful lot like bots - THEY RARELY SLEEP (not all graphs are shown)!

And these bot look-a-likes also appear to be voting in an identical manner. Are they owned by the same user? Are they coincidentally just watching the same set of authors waiting to upvote? And what the hell is going on during the weekend?

It's also possible that the first two users are relatively new and the bots haven't done enough upvoting yet to look like bots. Only time will tell.

How many bots are there?

Now this is the burning question! Based on this data, which is a really, really small sample size and doesn't tell the whole picture by any stretch of the imagination, 2% of the users were identified as potential bots.

I would love to see some proper statistics done on the blockchain to find this out!

Show me the code

You can find the code for everything in the jonblack/steem-data repository. This is much easier than posting it here. Fork to your hearts content!


Like my post? Don't forget to follow me!

Big shout out to @klye for the banner image. Fantastic stuff!

Sort:  

Awesome article man. Thank you for illustrating with #klyeart :)

I'll be upvoting this on the 30 minute mark.

good job Klye!!

Cheers, Thank you for the love Ma'am.

@bitcalm you may find my weekly report interesting check out my latest here https://steemit.com/steemit/@me-tarzan/the-weekly-steemit-bull-report be sure to follow

Good work! In my observation, there are two types of upvote bots - tied to certain authors at a certain time after post. And stalker bots which follow whales around. I'd love to see some analysis for who the most stalked whales are. :)

If you could see at what time after posting the upvotes come in, for bots it's always the same time. And if there are bots who upvote the moment after a whale does. And of course, the defining characteristic is the bots don't sleep. Although some might to regenerate Voting Power. Finally, some whales may be hiring curation teams that work round the clock.

How deep does the rabbit hole go? :)

A lot of great points. As for Voting Power I think they just vote consistently (every hour) but not a lot, in general. I think that way they maintain a good average and cover all hours. Maybe.

The smart bots certainly do that. There were some daft bots previously who would be hanging on with 2% voting power, but those are mostly gone now. @wang seems to be maintaining a 35% voting power, I guess playing a game of numbers.

Another great post! Nice job weeding out bots from real people, determining that could be a real pain. Perhaps a future improvement to this is running this through a machine learning algorithm to better determine which are bots, especially since they're becoming smarter. Great job!

Yeah, machine learning would be a much better approach, but a lot more work. Bot authors can read this and code around it, though they would probably have to forfeit some curation reward (e.g. pretend sleeping).

I just thought of something. How about taking into account if they have a valid proof photo as a metric? That would mean looking into their posts for something tagged introduceyourself, then parsing it to look for a photo. It gets a bit hairy while looking for the steemit written in paper, but I guess that's the general gist. I know not everyone has a verification post (I certainly don't) but it could help weeding out bots.

The irony that at 10 minutes a bunch of upvote bots arrive.

More interesting analysis. Some people are insomniacs though so may seem like bots but might actually be trying to occupy themselves. I know of several friends who actually sleep with their phone next to them for this reason. It's probably awful for your health though.

I'd be surprised if they could vote consistently that way. Definitely not good for their health. I wonder where they lie on the graph (pun intended) :)

Interesting facts and good to know about the bots. Clearly these people aren't real, everyone needs sleep. However, from my understanding it seems that bots will have much lower reputation and eventually have no power or say in anything that happens in Steemit. So should we worry about them? I'm not sure. Great job ;) Alla

Thanks Alla! When people downvote posts and comments by bots their reputation goes down. But if a bot only ever upvotes you need to pay close attention to see that it's happening. I wouldn't worry about bots...not yet anyway. :)

Thanks for this. Bookmarking: $r.dev $r.python $r.steemdev $r.development And, no I am not a bot just a human who thinks like one :p

"2% of the users were identified as potential bots"
You lose very important word:
"2% of the ACTIVE users were identified as potential bots"
Is it worth mentioning amount of users with at least one post?

VERY good point and important to make the distinction.

There are a number of ways to improve the algorithm Number of posts is difficult because some users write posts with the same account that their bots upvote with.

Great Article.
What else should i write so that i don't sound like a bot?

Realise that bots want to appear human. So if you appear human, you won't be mistaken for a bot. Simple. :)

Coin Marketplace

STEEM 0.19
TRX 0.15
JST 0.029
BTC 63402.76
ETH 2554.52
USDT 1.00
SBD 2.66