I created a bot which purpose is to detect spam comments on Steem blockchain. It uses Multinomial Naive Bayes algorithm. It can reply to spam comment or downvote it. I've done it for #polish community, but it can be adapted for every tag (or all tags) - it's a matter of training file.
Log from console:
$ POSTING_KEY=<posting_key> spam_detector.py config.json
Private posting key is stored as environent variable.
All parameters are stored in config.json file.
|account||account used by bot|
|nodes||list of Steem nodes|
|tags||tags which are observed|
|probability_threshold||threshold to classify as spam|
|training_file||input training file|
|reply_mode||0 - without reply, 1 - with reply|
|vote_mode||0 - without vote, 1 with vote|
|vote_weight||weight of the vote from range [-100.0, 100.0]|
Training file contains rows with label
spam like below:
ham For years I am interested in the Anglo Boer war and their fight against the Commonwealth. W. Churchill's military career was full of hits and misses but he gained huge popularity in England after the war. Great post once again. ham The development of a language can be a very boring topic, but sometimes it can also be exciting. While I was never interested in the development of the English language, I often liked to learn some facts about the development of the German language (maybe since it's my native language). It's nice to see that German has influenced such a young language. Maybe I'd even be able to understand some Afrikaans and maybe it's even worth a try to learn it. ham This country is a cultural boiling pot like none, so many interesting stories ; struggle for independence, new discoveries, many different influences makes it a unique place , so also the language carries all this exciting details in it. I will definitely start learning Afrikaans .. one day ;) ham I am South African born. I also grew up in an Afrikaanse town. Despite my background and being able to speak the language, I was not aware of the history. Probably should have paid more attention in school, lol. Thanks for the lesson! ham I thought as much. So, you've been in this game since April 2016. I must be joking to think I've arrived when I haven't even started. I'm throwing away my white collar to settle for this. Should you need a minnow to mentor, please, let me be the number one to be considered. ham Great interview, guys! We haven't met in person yet, but I'd say Tom's one of the most grounded people I've ever come to know. I love your focus and straightforwardness, combined with your endless generousity. For many reasons you're one of the most successful but especially most admirable people on this platform. spam Upvote, follow, resteem spam Followed and resteemed spam Great photo spam Follow me I will follow you spam Hey Beautyfull love your blog... UPVOTED and RESTEEMED spam Hi there, i RESTEEMED & UPVOTED for you! Have a nice day. spam Cool pic bro spam Thank you for sharing i will resteem it spam good post spam Super
- libraries: steem-python, scikit-learn, pandas, textblob, bs4
Repository contains requirements.txt file.
There is still a lot that can be done:
- enlarging the training set
- adding new algorithms such as Neural Network or Support Vector Machine
- taking into account previous comments, not only current one
- taking into account user reputation
- adding to blacklist / whitelist
Posted on Utopian.io - Rewarding Open Source Contributors