SteemReports - Experimental Spam Classifier

in #steemit7 years ago (edited)

I've been working on some technology called 'Machine Learning' lately. This is an application of artificial intelligence based around the idea that we should really just be able to give machines access to data and let them learn for themselves.

Well we have lots of public data in the blockchain, but it's obviously not quite as simple as that. We as programmers need to define certain patterns and features which these algorithms can look for and learn from. In the case of Steem account classification, such features might include the comments that each account makes, and how they vote for example.


spampic.jpg
http://www.steemreports.com/account-classifier/

If you have time to try a few names, I'd appreciate any feedback on accuracy in the comments. The most helpful comments would include the account name(s) you tried, and what you think the proper category should have been if the classification was wrong.


What I really need to do next though, is to collect more accounts which have been properly classified by reliable humans, and feed these into the classifier software. Then refine it to include more features, and see how accurate this can make it.

Any help with collecting and classifying accounts would be greatly appreciated. The dream is a front-end where our discussions and search aren't filled with this horrible time wasting stuff!


Please vote, resteem and follow us to for more reports and services, and visit our website:
http://www.steemreports.com

Sort:  

Great, just classified as a Human Content Creator. I have a question for you. What programming languages one should learn to access all the blockchain data? I would appreciate if you guide me through all of this stuff as I've started learning some basic coding and would be starting my new career as a coder. Thanks.

Thanks. I don't have time to guide you through it all I'm afraid, but I'd suggest that using Python is probably the best language to start with. I am biased, but it is used used by beginners and experts. JavaScript is another good choice, but not so easy to understand IMO.

I use only Linux, so can't really explain how to set things up in other operating systems, but you'll probably need to download/install python and then install steem-python. After that, a search might help you further.

Good luck!

Thanks for the reply, you answer actually answered my question and that's what I wanted to know. Thanks again!

I looked at many bots, also in #nsfw, and it is pretty good at classification already.

Thanks for the feedback, I'll go through those and see if it gives me some more ideas. The classification algorithm doesn't look at voting patterns at this stage, so that explains why those don't work yet, but it's good to know about them!

Ive tried it with 7 accounts so far. Every result was accurate.
Really useful tool. I will use it in the future.

Cool. Will give it try.

Great feature .. I tried it and it says I am classified as a human content creator. I am so glad.. lol . I like the other things you already developed too. I like the open mic music report too at the top.

Great, thanks! It's far from perfect at the moment, so it'll probably offend somebody shortly ;)

@originalworks / @unprovoked maybe you guys have knowledge to share?

Looks like the bot is having a hiccup as it voted but didn't comment...? Or is steemreports not being a Human Content Creator?

Not sure about that. I took some of the top paragraph from something else I wrote, which I may have paraphrased from the web (at that time) to be honest, it's just about the optimal definition. The image was taken from the web, but required no attribution. I'd like to think that it would still be considered original content, but maybe that's the reason? ;)

Coin Marketplace

STEEM 0.21
TRX 0.13
JST 0.030
BTC 67334.64
ETH 3519.34
USDT 1.00
SBD 3.10