Biggest Spammers Report

steemreports (64)in #steemdev • 6 years ago (edited)

http://www.steemreports.com/sincerity-biggest-spammers/

Here's a user friendly view of the Sincerity API's Biggest Spammers list.

It focuses mostly on comment spam and the list is autoupdated with the accounts with maximum spammer scores, and sorted by the number of comments made in the last calculation period (currently 14 days).

This is generated by a machine learning algorithm, which may produce a few 'false positives' where non-spammer accounts are detected as spammers. If you find any such accounts, I'd be pleased if you comment about it, as it will help me improve the spam classification algorithm.

#steem #spam #steem-sincerity

6 years ago in #steemdev by steemreports (64)

$59.72

Sort:

Trending

[-]

cardboard (61) 6 years ago (edited)

I need to dig in into the api and use it to validate the vote buyers :) tipuvote!

$0.08

2 votes

[-]

tipu (67) 6 years ago

Sorry, @tipU is currently recovering voting power. Please try again later! ;)

$0.00

[-]

cardboard (61) 6 years ago

Hush, I have special privilages!

$0.03

1 vote

[-]

steemreports (64) 6 years ago (edited)

You might need to watch your spam score if you go around talking to bots! ;)

$0.00

1 vote

[-]

cardboard (61) 6 years ago

Better then to myself, lol.

$0.00

[-]

cardboard (61) 6 years ago

True!

$0.03

1 vote

[-]

basicstoliving (58) 6 years ago

I hate those types of responses, I hate to used up voting power to flag them, but I guess I am going to have to as they are becoming more and more prevelt in many posts.

$0.05

1 vote

[-]

daan (68) 6 years ago

I have to admit that I've never actually downvoted someone (except 1 phishing comment). With just 250 SP, it wouldn't really make that big of an effect.

Might have to rethink that, if they're really just earning a couple of cents every 100 posts or so.

$0.05

1 vote

[-]

bobcastleman (37) 6 years ago

I was wondering what data points you are using to flag something as spam. One that I thought might be useful is mean time between posts/word count. Or some variant. I saw an account today that was posting a 500 word article very 10 minutes or so. Sorry, but nobody writes that fast and ends up with the quality. These had to be cut and paste - maybe even blatant plagiarism.

In theory, you could trap plagiarism by comparing consecutive posts and seeing if there is linguistic consistency between them. Someone cutting and pasting content from other sites would show variations in vocabulary, sentence structure and other linguistic markers. These markers would be similar if posted by the same person.

I know there is academic work that has done this very thing but I suspect it would be a difficult task to do in real time.

Anyway, good work on this project. Keep it up!

$0.04

1 vote

[-]