*TrufflePig*: A Bot based on Natural Language Processing and Machine Learning to support Content Curators and Minnows

sorin.cristescu (71) 7 years ago (edited)

Hey @smcaterpillar, this is the Sorin human who has spoted you as an underrated contributor to a better world ! I'll be in Berlin on March 15th and 16th, would you perhaps care to meet in person?

A few comments on your post: given widespread payout manipulation, a more complex algorithm will certainly improve the results in the future. You write "The basic idea is to use well paid posts of the past as training examples to teach a Machine Learning Regressor (MLR) how high quality Steemit content looks like" - that is good enough in a first version but in the long run it is not the best value-adding approach because it offers "reflexion" / "echo" and encourages "more of the same".

The referential for what "quality" means should be external to the mechanism that Steemit uses to value posts, otherwise it comes very close to what Baron Munchhausen was doing by "pulling himself up by the straps of his boots".

$0.81

smcaterpillar (52) 7 years ago

underrated contributor to a better world !

I'm not so sure if my bot makes the world any better. I'm at least glad that Steemit operates on proof of (delegated) stake and is not as wasteful as BTC. So at least my bot doesn't make the world any worse.

[..]given widespread payout manipulation[...] The referential for what "quality" means should be external to the mechanism that Steemit uses to value posts, otherwise it comes very close to what Baron Munchhausen was doing by "pulling himself up by the straps of his boots".

You do have a point regarding the massive manipulation due to voting bots and services. By the way, irony intended by using such a service for your comment?! :-D

However, I beg to differ here at least to some degree. If there wasn't any correlation between payouts and quality than Steemit's premise as a curation platform due to proof of brain and sheer existence would be fruitless. So the bot's idea is to pull back attention from the voting bots and steer the platform as whole more towards rewarding quality content, whatever this is.

This brings me directly to my second point. What is quality content? This is really hard to evaluate. Is it something chosen by a jury or high intellectuals? Or stuff picked directly by the readers themselves? TrufflePig relies on the latter. Yet, if there was some external measure of quality, what would it be?

Of course, the taste of the masses may not cater to the taste of an individual. For instance, I can't stand most of today's popular or chart music :-D. Finding content that is right for you, in particular, is definitely not the aim of @trufflepig. However, I do see the need for more personalized recommendations. To quote the wise words from someone who knows much, much more about this platform than me (@lextenebris):

One of the big problems with Steemit as I see it is the fact that trying to find content that you're interested in is like sipping from a fire hose. One directed straight into your face.

I'm experimenting currently how I can reuse parts of the bot to create more personalized recommendations. So stay tuned.

Here's my LinkedIn profile

Sure, why not, added you as a connection.

$0.10

sorin.cristescu (71) 7 years ago

"By the way, irony intended by using such a service for your comment?! :-D"

Absolutely ! I'm experimenting in order to learn because the whole mechanics is not only complex it is also obscure (probably on purpose). I intend experimentation to go on at several levels - for instance I've "pumped" my last post over the $100 bar, see if this psychological threshold plays any role here ... not sure but we'll see. I do believe my post is good though :-)

Then back to the trufflepig discussion - I absolutely agree that the whole idea of rewarding content in Steemit is valid: there definitely IS correlation between payout and quality! But my argument went to the "second degree" and looked at trufflebot: since the correlation is not 1 and Steemit is ALREADY using this assessment dimension, re-using it in trufflebot is "procyclical" and reinforces whatever bias this dimension has.

On the contrary introducing another assessment dimension helps to give more balance and offers an alternative. Precisely because it's difficult to say what is quality and all we know is that the equation "high payout = quality" certainly does NOT hold (not 100%, not for all posts anyway) then maybe we can do better by defining quality along more than one axis / dimensions

And the idea is that through the trufflepig YOU, the owner of the bot, are free to define your own assessment dimension. Some people will certainly disagree with your choice of what you consider to be quality but so what ? They are free to create their own trufflepig and train it with their parameters if they wish.

$0.06

5 votes

trufflepig (66) 7 years ago

Congratulations! Your post has been selected as a daily Steemit truffle! It is listed on rank 1 of all contributions awarded today. You can find the TOP DAILY TRUFFLE PICKS HERE.

I upvoted your contribution because to my mind your post is at least 70 SBD worth and should receive 170 votes. It's now up to the lovely Steemit community to make this come true.

I am TrufflePig, an Artificial Intelligence Bot that helps minnows and content curators using Machine Learning. If you are curious how I select content, you can find an explanation here!

Have a nice day and sincerely yours,

TrufflePig

smcaterpillar (52) 7 years ago

Hurray! 1337!

This is really not staged, I swear! He came up with this selection by himself!

$0.00

beeyou (59) 7 years ago

This is a very interesting concept. Thank you for stopping by my blog @trufflepig! There are so many posts written by minnows that go unnoticed. We try to find some of these undervalued authors with the #newbieresteemday initiative, but it's a manual search and curation. I will definitely be following along!

phgnomo (64) 7 years ago

smcaterpillar (52) 7 years ago

That does not work (yet) :-)

$0.00

phgnomo (64) 7 years ago

:(
It's a really interesting feature.

$0.00

smcaterpillar (52) 7 years ago (edited)

Yes, I manually downloaded the trained bot from my VPS and let it check this post, should be worth 70 SBD and 170 votes :-D Yeah!

$0.00

smcaterpillar (52) 7 years ago

Maybe it's getting listed tomorrow as number one truffle :-D
There's some nitty gritty details, though, that I did not mention, so if this post makes more than 10SBD it won't be listed as truffle anymore.

$0.00

phgnomo (64) 7 years ago

I think this is one of my favorito curators services already. Not because It featured one of my posts, but It is a really interesting concept.

$0.37

3 votes

minnowpowerup (58) 7 years ago

You have collected your daily Power Up! This post received an upvote worth of 0.29$.
Learn how to Power Up Smart here!

$0.00

smcaterpillar (52) 7 years ago

Maybe I will include this feature into the batch job instead of making a service. This means if you call @trufflepig manually, in the worst case you have to wait 24 hours. Yet, this makes my life much easier because I do not have to deal with concurrency issues of having the bot upvoting and commenting under the top list truffles and also commenting and upvoting on demand :-D.

$0.00

phgnomo (64) 7 years ago

That would be awesome, even if it would take 24 hours for the analysis.
As soon as i have an enough amount of SP i will definetly delegate to this bot.

$0.02

smcaterpillar (52) 7 years ago (edited)

Nice, thank you! Btw, including it in the batch job has the advantage that the bot won't comment twice under your post in case you did make into the truffle top list. Moreover, making it into the top list will also yield a higher vote from the bot than calling it manually.

$0.00

smcaterpillar (52) 7 years ago

@trufflepig

$0.00

trufflepig (66) 7 years ago

Huh? Seems like I already voted on this post, thanks for calling anyway!

$0.00

Show 1 more reply

smcaterpillar (52) 7 years ago

Let's see if this works. Hasn't been merged to master yet, but the server is operating on a beta branch now. We'll now for sure tomorrow. So, here it goes:

$0.00

long888 (66) 7 years ago

Glad to know @trufflepig

smcaterpillar (52) 7 years ago

Short update on the roadmap:

I want to conduct further experiments with different ML regressors as well as feature encodings. I already made some experiments using Doc2Vec instead of LSI. But this was not very fruitful. A more thorough investigation may improve the bot's judgment further.

I did this and improved the bot slightly. From now on the LSA is not only computed over tokens, but over bigrams of tokens as well.

I also tried trigrams and 4grams as well as skip-grams, but they did not improve the bot's performance.

I'm currently working on the @trufflepig call a pig feature, afterwards I'll focus on the recommendation system.

drmake (55) 7 years ago

I like the idea of calling the bot to any post to make a prediction :)

$0.02

smcaterpillar (52) 7 years ago

Works now :-)

$0.00

vladimir-simovic (67) 7 years ago

Thank you for the contribution. It has been approved.

You can contact us on Discord.
[utopian-moderator]

$0.00

smcaterpillar (52) 7 years ago

Great, thanks!

$0.00

utopian-io (71) 7 years ago

Hey @smcaterpillar I am @utopian-io. I have just upvoted you!

Achievements

You have less than 500 followers. Just gave you a gift to help you succeed!
This is your first accepted contribution here in Utopian. Welcome!

Community-Driven Witness!

I am the first and only Steem Community-Driven Witness. Participate on Discord. Lets GROW TOGETHER!

Vote for my Witness With SteemConnect
Proxy vote to Utopian Witness with SteemConnect
Or vote/proxy on Steemit Witnesses

Up-vote this comment to grow my power and help Open Source contributions like this one. Want to chat? Join me on Discord https://discord.gg/Pc8HG9x

$0.00

smcaterpillar (52) 7 years ago

Thanks a lot :-)

$0.00

sneakyninja (63) 7 years ago

It's a great concept! Nice to see some background and a full feature road-map.

Thanks to @josephsavage, this post was resteemed and highlighted in today's edition of The Daily Sneak.

Thank you for your efforts to create quality content!

$0.00