*TrufflePig*: Introducing the Artificial Intelligence for Content Curation and Minnow Support

guruvaj (57) 8 years ago

I am not a programmer, but the idea is quite impressive.
Do your best in coding.

We minnows badly need that in our post, since bots overlooked our post.

Is it an AI in a bot mode?

$0.05

smcaterpillar (52) 8 years ago

Well, the bot mode is still missing, I am currently working on that, though :-)

$0.00

maddyy (36) 8 years ago

O wow
Intresting bro...
We r going to get many help

3 votes

smcaterpillar (52) 8 years ago

Thanks :-)

$0.00

maddyy (36) 8 years ago

Hey bro can u upvote mine post
M new also so i doesn't get vote so i want help from all of u person

$0.00

ammonite (67) 8 years ago

Although i am not a big fan of bots I think what you are trying to do is commendable.
Its a pity that something like this is even necessary to help find good content among all the nonsense but I really like your approach and think it could lead to a great curration trail.

cryptoexplorer7 (50) 8 years ago

this will surely help many. Keep it going

3 votes

tech-mac (43) 8 years ago

Well done!! Thanks a lot for taking such initiative. We the newly joined Steemians was facing such problems a lot.

Upvoted!!

voxhumana (25) 8 years ago

Looks like a cool project @smcaterpillar . Looking forward to read more about it!
May I give some thoughts which comes in my mind?
Have you removed common words like he, she, the, a ...? In your found topics it seems they are still in. Also html tags like "href" are in. Removing them could already improve your features.
Have you done a e.g. 5 fold cross validation of the training and test set? This often gives a more realistic view of the predicted results.
Did you have a look at the far outliers in the prediction and reviewed them manually? I think this could give interesting insights to improve your features.

Thanks for sharing :)

smcaterpillar (52) 8 years ago (edited)

Hi, these are some good remarks, thank you. Let me address them one by one:

"Have you removed common words like he, she, the, a ...?" In your found topics it seems they are still in.
Yes, but rather arbitrarily. I just filtered any word that appears in more than one third of the training set posts. Apparently this has left she and he in there, but at least removed a and the. I have to try to lower the threshold, maybe to 10 or 20% of all documents. Definitely worth trying to find a sweet spot via cross validation.

Also html tags like "href" are in. Removing them could already improve your features.
Yes, damn, I wrote a bunch of regular expression filters, I missed href, though. Will be included in the next version.

Have you done a e.g. 5 fold cross validation of the training and test set? This often gives a more realistic view of the predicted results.
I haven't done any cross validation, yet. But will definitely do to tune some hyper-parameters such as number of topics or the word filter threshold. I have to see if it makes sense to also tune some forest parameters like max_depth, max_leaf_nodes, or percentage of features at each split. What I have done though is to run the model a couple of times with a different RNG seed to see if results are consistent and robust (they are).

Did you have a look at the far outliers in the prediction and reviewed them manually? I think this could give interesting insights to improve your features.
I haven't done a very thorough investigation, yet. However, the truffles you are seeing in the post above are, by definition, some outliers, they have the highest difference between real payout and predicted.

Thanks for the feedback, really appreciated!

$0.00

jeanpi1908 (63) 8 years ago

Nicht schlecht nicht schlecht. Eine Sache die ich noch als Problem sehe ist dass die Post die da gefunden wurden eben schon ziemlich alt sind.

smcaterpillar (52) 8 years ago

Danke und ja, das stimmt natürlich. Das ist der Offline Prototyp. Der Online Bot lebt wahrscheinlich in einem Dockercontainer auf einem Server und scraped Steemit in regelmäßigen Intervallen und macht dann auf Trüffel aufmerksam, die so zwischen 2 und 6 Tagen alt sind, sodass die Community noch Zeit hat Votes zu platzieren.

$0.58

alexdory (59) 8 years ago

Wow. I am a software engineer and I find your work amazing. Just read it and I guess it was a monumental work. We await more data, I have worked a little with voice recognition and machine learning, for the software I work on and I know how painstaking it is. You deserve the boost you are getting from our common friend :)

smcaterpillar (52) 8 years ago

Thanks, in fact, the monumental work is still ahead. I do a lot of Machine Learning, my daily bred and butter kinda thing, so setting up a Jupyter notebook to do the stuff above took me no more than a few hours. However, turning the Prototype into a production ready bot, this will be a lot more man hours of work.

A proper test driven development project
Proper scraping and handling of Steemit posts
Proper and regular retraining of the AI
Regular posts of truffles and voting and commenting on the truffles
Actually getting the trufflepig steemit account is also a problem, I registered the account more than a week ago and haven't heard anything back, yet.
Dockerizing everything and deploying it on a server
Maybe, I even have to run a full Steemit node myself, let's see

$0.00

alexdory (59) 8 years ago

I am all buckled up for the ride :)

$0.00

smcaterpillar (52) 8 years ago

The ride is pretty much ongoing and the bot live and deployed: @trufflepig

$0.00

toddoto (46) 8 years ago (edited)

TrufflePig. Love it. Great combination of technical and writing acumen. Thanks for contributing such a valuable proposition to our community!

thatsweeneyguy (61) 8 years ago

That looks very interesting. I hope your algorithm will exclude rewards given from bots, to determine a more accurate evaluation of the public's view of a post.