Steem Sincerity - Update and Community Involvement

fraenk (62) 6 years ago (edited)

I am going through some of the "human"-classified accounts to check whether those contain some potentially misleading data-sets. I'll collect the results in comments below:

_{DISCLAIMER: the following interpretations are only MY subjective opinion, nothing else.}
_{P.S.: this starts looking a bit spammy in and of itself, sorry, did'n expect this to be so many so instantly...}
_{P.P.S.: I also started looking at the "spammer" classified training data, lot's of humans and bots in there (imho)... the training data seems to me like it could do with a much more thourough vetting process!}

$0.28

fraenk (62) 6 years ago

andybets - (&steemreports) has the right classification as human, but to be honest, you should probably remove your own account from the training data and see how your own AI ranks yourself, just to get a first-hand feel for it...

I would remove this account from the training-set to avoid any subjective in-house-biasing

$0.17

andybets (62) 6 years ago

Thanks for this. It's very helpful, I'll review your suggested changes.

$0.11

andybets (62) 6 years ago

I agree with all the changes you suggested, and have adjusted the data sets. :)

$0.10

fraenk (62) 6 years ago

Awesome, I am glad I could help out!

As mentioned above, I believe there's also quite some "false positives" under the training-spammers, too... I'll go through some more of those when I find the time.

$0.06

andybets (62) 6 years ago

That'd be great. I have just sent you 2 SBD as a small thanks.

fraenk (62) 6 years ago (edited)

dailytop10open - might be operated by a human, but the pattern looks more like a bot-classification to me, very repetitive content and the comments are primarily "functional"

I would reclassify as bot or just remove it from the training set to avoid ambiguity

$0.09

fraenk (62) 6 years ago (edited)

jehovahwitness - ok, i might be biased, but looking at their comments close up reveals the same set of a dozen or so "inspirational" comments being repeated over and over, I think it's questionable if this is actually a human and it may even be seen as spam by some.

i would remove this from the training data due to it's ambiguity

fraenk (62) 6 years ago

new-york - I think this is without a doubt spam, and probably bot-spam! The same identical "promotion" comment for a "resteem-service" is being posted over and over and over

I would reclassify this as spam

fraenk (62) 6 years ago

altobot - a self-proclaimed bot posting "manual" reports, probably it should be seen as more bot than human?!

I would remove this from the training data due to it's high ambiguity

fraenk (62) 6 years ago

austrobot - self-proclaimed trailing bot that posts manual content (?)

I would remove this from the training data due to ambiguity

fraenk (62) 6 years ago

coin.info - definitely a bot, has no original content leaves comments notifying of crypto rates of coins mentioned in the original posts

I would reclassify this as a bot

fraenk (62) 6 years ago

dailypick - curation service, might be manual, could be automated, repetitive comments look very bot-like

I would either reclassify this as a bot or remove from the training data to avoid ambiguity.

fraenk (62) 6 years ago

followforupvotes - self-proclaimed voting bot random voting it's followers and leaving repetitive comments and posts. No question this is a bot

I would reclassify this as a bot

$0.07

sherlockholmes (60) 6 years ago

More of the large amount of data being collected will soon be available in the form of new APIs which relate to characteristics of voting, commenting, etc.

You have me more than curious! Keep up the great work, your tools are already essential to "screening" suspicious activities.

I commend your progress on making this community more transparent!

$0.07

conradino23 (53) 6 years ago

I'm stoked to see its progress!

cardboard (61) 6 years ago

So, Skynet next? :)
@tipu upvote this post with 0.5 sbd

amico (66) 6 years ago (edited)

Hey @cardboard, I love smartness's @tipu! ;)
I just joined it: thank you for this innovative service!

$0.00

fraenk (62) 6 years ago

It's awesome to see this make progress and improve in accuracy ...

It still leaves me with some worries when I check the current Top-Spammers according to the sincerity API:

While some of these accounts are in fact leaving very repetitive comments that may well be seen as spam... they are certainly lacking the volume to be in the ranks of "top-spammers".

At least that's my subjective interpretation of how to define spam... quantity does play a major role here.

Taking into account that there are accounts like @a-0-0 leaving 27k comments in the same timeframe, I think something should be done on that aspect.

OR, if the API purely want's to classify, it maybe just shouldn't publish a "ranking"?!

andybets (62) 6 years ago

It's a fair point. I will also add a list of accounts sorted by the most comments made.

I guess that because accounts like this which already have a negative rep, probably aren't interfering with most people's experiences anymore, so aren't being reported as spammers by the community. This software is increasingly using a community average of spammer as its classification definition.

$0.12

fraenk (62) 6 years ago

Great!

This software is increasingly using a community average of spammer as its classification definition.

I think that's exactly what we need and I am stoked to see this increasing in "accuracy" of reflecting that.

The classification score for those "top-ranked" spammers does not feel inaccurate to me to be honest, but calling those the top-spammers is taking the result a bit out of context imho.

$0.00

tipu (67) 6 years ago

Hi @andybets! You have received 0.5 SBD @tipU upvote from @cardboard !

@tipU! upvotes with 200% profit and pays 100% profit + 50% curation rewards to investors :)

$0.00