Evolved08gsr's SteemSQL Analysis - Post #001 - Working to Identify and Eradicate Spam Accounts

in #steemsql7 years ago (edited)


There is so much that we can learn from analyzing the Steem BlockChain!

First off, I need to Sincerely Thank @arcange for developing / hosting / providing this public version of the Steem BlockChain. I can honestly say that since I've started "playing around" in the database, I have learned a great deal more about how and what data a BlockChain stores than when I started this process.


Quick Background About Me: I have an investigative mind and nearly a decade of professional experience in a field where I spend many of my days analyzing SQL Server Databases.


My Mission is to use the data in the Steem SQL Database in order to provide insights to All Steemians, from Whales to Minnows
(or whatever levels you classify as the highest and lowest Steem tiers)


The First Topic I want to Cover is Potential Spammers

I think that we have all experienced much more frequent spamming of comments and posts by various accounts lately. I would like to set up a "Coalition" of sorts where we identify these spam accounts and eradicate them from the Steemit Platform. This will ideally kill off the pests, and stop new ones from forming because they realize we are united in keeping Steemit a Legitimate Community.

Once we get consensus that the process is working, I'll create a new tag that everyone can reference and use to down-vote / flag the spammers!

Disclaimer: I am running a systematic process based on trends in the Steem Blockchain. These processes are not guaranteed to be 100% accurate, however, they do provide insights into potential Spam Accounts.

My first pass, included below, is on the more conservative side. If I see a frequent pattern of posts (the same comment/post, used 15 or more times in a 24 hour period) from a single account, I have included that account with some brief Summary Statistics.

After 1 hour 54 minutes and 17 seconds ... (I need to make this query a bit faster) ...

Topline Stats:

  • 1,582 Accounts have at least 15 Repeated or Similar Posts in a 24 Hour Period (since the beginning of time on Steemit)
  • 818 / 1,582 Accounts posted their 15+ Repeated or Similar Posts during only a single 24 hour period
  • 296 / 1,582 Accounts posted their 15+ Repeated or Similar Posts during two (2) 24 hour periods
  • 111 / 1,582 Accounts posted their 15+ Repeated or Similar Posts during three (3) 24 hour periods
  • 74 / 1,582 Accounts posted their 15+ Repeated or Similar Posts during four (4) 24 hour periods
  • 56 / 1,582 Accounts posted their 15+ Repeated or Similar Posts during five (5) 24 hour periods
  • 32 / 1,582 Accounts posted their 15+ Repeated or Similar Posts during six (6) 24 hour periods
  • 62 / 1,582 Accounts posted their 15+ Repeated or Similar Posts during seven (7) 24 hour periods
  • 22 / 1,582 Accounts posted their 15+ Repeated or Similar Posts during eight (8) 24 hour periods
  • 15 / 1,582 Accounts posted their 15+ Repeated or Similar Posts during nine (9) 24 hour periods
  • The remaining 96 / 1,582 Accounts posted their 15+ Repeated or Similar Posts during 10 or more 24 hour periods (a maximum of [94] 24 hour periods)

Since many of these results include Accounts that I would consider as valid individuals and useful content providers, I filtered the results again to only see the Accounts with 5 or fewer Unique Repeated or Similar Posts. This added filter took me down to 1,277 total Accounts.

I had already included a "White List" of Accounts (accounts who frequently post similar content, but I am not considering as spam bots):

  • cheetah, steemcleaners, bottymcbotface, welcomebot, steemitlotteries, steemitboard, randowhale, minnowsupport, bot-helper, steemit.tips, booster

Here is a sample of the first 50 Accounts (from the 1,277 Accounts mentioned above):


I know that my process is at least partially working, because I was able to find this Account, which I'm nearly certain is a Spam Account:

As well as this account, which I also believe to be a Spam Account:

Please work with me to improve my logic! How else would you identify Spam Accounts?!


_____________________________________________________

Thanks for reading, I hope you have a great day!

_____________________________________________________

If you want to read more, please take a look at some of my other posts:

Sort:  

Updated with Preliminary Results! I'm looking for anyone to provide some more insight to make my process more accurate.

Thanks In Advance!

Great initiative. Why not include your SQL? Doing SQL for 25 years, willing to help you optimise queries or constructing some funky new ones :-)

I wasn't sure if I wanted to include the logic to start off with, but once I have a polished version I do plan to include the code. Also, with the amount of indenting, spacing and comments that I use when I write code, it can get to be a very lengthy post, so I'll have to think of a clever way to display it.

My optimization technique right now (that is working AMAZINGLY) is setting up temporary tables first, Indexing them, then running my Queries. Since I don't have admin rights to the DB (only read access), I can't add indexing that will specifically optimize my queries.

Thanks for the reply, and the offer! I will definitely keep you in the loop with my logic and take you up on advice if I run into any issues.

I posted this early.. stay tuned.. my query is running..

Fantastic work you do here! Unfortunately I can't say anything about improving logic, but I'll resteem it so that more users would see and may be join the project

Thanks, @inber! I have a few more ideas on how to split out the good posts from the bad. I also figured out how to make my process run much faster :)

excellent initiative
finding repeat comments if definitely a good start
although knowing spammers, they will just start randomizing things enough to fool detection. But this should at least get a base list.

The bigger question is what to do for those accounts? What is the reporting mechanism?

The best we can do is Mass Flag them. If we build enough of a following who are interested in stopping the Spam, I can start using a specific tag (in addition to the Spam tag) to allow our users to effectively down-vote all of the spammers so their posts no longer show up, and their rewards are removed.

Coin Marketplace

STEEM 0.20
TRX 0.14
JST 0.030
BTC 67978.59
ETH 3270.89
USDT 1.00
SBD 2.65