So I'm sure most of you have come across the @originalworks bot before without thinking much about it. Well, as someone who spends most of his time on Steemit fighting plagiarism I see this bot far more often than I should. I understand the thinking behind the bot and think its goal is noble but it is badly broken and really needs to be shut down and reworked.
What is the Original Works bot?
As I mentioned before, I think the bot was started with good intentions. The purpose of the bot is to attempt to verify that the post someone has written is original. The author, or anyone commenting on the post, can simply type @originalworks or !originalworks and the bot will check the post and post a comment determining whether the content is original or not.
In theory, this would be a valuable tool. Attempting to determine whether something is original or not can be very difficult and time consuming. You would have to look at the article and manually do a little digging around to see if there is anything a little too similar already out there. This bot would be quite the time saver and be a visible badge that someone was actually contributing some original content. Not only that, but its kind enough to give you a small upvote just for calling it to your post. Sound a little too good to be true? That's because it is..
Why it doesn't work
Creating a bot to detect plagiarism is incredibly difficult. The current gold standard in plagiarism detection is the cheetah bot created by @anyx. He put a ton of work into it and it takes a lot to maintain, as he spells out here in this FAQ about cheetah-
I have had to sacrifice accuracy for price, as the cost of running Cheetah is actually quite high. At the current rate, Cheetah has a direct cost of about $150-$200 USD per week, with an indirect cost even higher -- and rising. The funding to pay for her currently comes from Steemcleaners log posts. Development is ongoing, and has never stopped! I mostly aim to improve detection and reduce false positives (as I continuously receive negative flak for any mistake). While I don't expect a reward for the continued development, nor do I post updates about development (as the algorithm I have developed for content detection is effectively a trade secret, and thus sharing updates to it would be silly), I consider my role as a witness ( @anyx ) as the direct support. This keeps the project community driven, rather than sponsored. You can vote for witnesses here.
Even putting in that kind of effort, cheetah isn't perfect. It misses things for any number of reasons or will occasionally even come up with a false positive. The big difference is that cheetah, besides being more accurate, isn't called to a post to prove authenticity. It does its work in the background checking every post and commenting when it finds a match. So how much worse is @originalworks? lets look at a few direct comparisons.
Here is a good example of something I come across often. This is a post from last week-https://steemit.com/warcraft/@cryptopaze/world-of-warcraft-vanilla-pvp-orrim-20171229t93821429z. Scrolling down in the comments you will see this-
In a heads up battle of the bots Cheetah comes out the clear winner. The original post is copied word for word from this post, yet original works still certified it original. Think this is an isolated incident? Here is a post where the author links to a source, cheetah finds a source, yet original works still manages to certify it as original. I could find dozens of more examples where people call original works to blatantly plagiarized posts in an attempt to legitimize them. Like I said, no detection system is perfect. Here is a post that was missed by both cheetah and originalworks. The user takes the text from a youtube video and tried to pass it off as their own. The difference between cheetah missing it and originalworks missing it is that when cheetah misses it doesn't validate a post, when original works misses it asserts that something is original.
Hell, just in the time its taking to write this post I picked a random article from an originalworks comment and this is what I saw. https://steemit.com/steemit/@sumansid/why-is-zuckerberg-entering-crypto is a direct ripoff of https://techcrunch.com/2018/01/05/mark-zuckerberg-is-right-to-explore-the-potential-of-the-blockchain-for-facebook/ that originalworks certified as original content. Feel free to take a gander at the latest originalworks comments and I'm sure you wont have to click through too many before finding something that is clearly not original.
So what should be done?
First, I think that unless originalworks can drastically improve the accuracy of his bot he should shut it down. In my work with @steemcleaners I have come across people who believe that this bots comments actually mean something, that it legitimately means a post is original. That is a dangerous message to send considering how often the bot is wrong.
Second, if the bot isn't going to be shut down, at least make it stop upvoting people. That just gives people more incentive to call a broken bot to every post, plagiarized or not.
I really do think this bot was a good idea to begin with but unfortunately it is far too inaccurate to be worth anything here on steemit. It not only fails to detect a large amount of plagiarized posts but falsely legitimizes them as well. I have no idea if the bot will actually be shut down/reworked but I hope this post at least gets the message out there that a comment by originalworks in no way means the post is original. When it comes to plagiarism detection the best method is still just good old fashioned common sense and a little detective work.
Feel free to share/resteem this message so that more people are aware just how badly broken the originalworks service is. Leave me any comments/criticisms you might have, I'm happy to respond to them all.