Scraping and Analyzing the Steemit Trending Page – Blockchain Business Intelligence
Reaching the Steemit trending page is an awesome achievement. Posts tend to make nice sums of money, however previous analysis shows that to get on trending you must make a nice sum in the first hour anyway on the post. Trending then accelerates the posts exposure.
On the 2nd Aug I scraped the Steemit trending webpage. I did the same again on the 16th October and I did it again on the 21st Dec. Included in this analysis and report is the top 50 posts on trending on each date.
The reason I used web scraping and not the SteemSQL as my main data source is because trending posts are front end and the data is not stored in the blockchain. There is now way to know what post were trending and when from the blockchain. It is also worth noting that to be able to carry out analysis and comparisons on the data, scraping the data at one point in time would give nothing to compare too. So for this analysis I have been collecting data since August.
High Level Overview
150 posts were pulled over the 3 scrapings. Over the three scrapings there were 107 unique authors. In August there were 39 unique authors. October there were 43 unique authors and in December there were 40 unique authors.
The average votes per post on the trending page was 392 in August. In October it was 210 and in December the average number of posts on the top 50 posts in trending was 510.
In August the average post payout value was $251, this reduced to $165 in October and December is coming in at $369
We can see from the table that some authors on Steemit have multiple posts on trending, in multiple months. 30 of the 107 Authors have had more than 1 post in trending. That’s 28% of authors appear more than once. 72% of Authors on trending only trended once.
Of the 150 posts, 73 of these have come from the 28% of authors that appear more than once. That equates to 48% of the posts from 28% of the authors.
Detailed Analysis
The table above shows the number of votes by authors and by scraping date. It is clear to see that posts from August and December received a higher number of votes that October. We can see @buildteam have the highest number of votes. This relates to just 1 post that @buildteam got on trending in August. @sweetsssj tops the votes in December, however this vote count is based on two posts
Below the first table shows SBD payout value and number of Posts per author first for Aug, the Oct and finally December. @dan and ned toped the charts in Aug and October with @adsactly topping the chart in December. In December @adsactly appears for the first time in trending with 4 out of the 40 posts.
Below is a breakdown of the Categories in which the posts belong. It is really good to see such a wide spread of categories reaching trending.
The authors reps for the 150 posts scraped over the 3 scrapings can be show below
With the exception of one account with a rep of -18, all other authors on the trending page have a rep of 49 or higher. This can be see each month in the chart below, and then below that details the average rep of Authors in trending for each scraping
We can see in Aug 8 posts have authors with a rep of 73, in Oct, 11 posts have authors with a rep of 68 and in Dec 7 posts have authors with a rep of 70.
When scraping the trending page, it is possible to get the age of the post. We can see from the table below that the % of posts that were trending on the same day as publishing was 60% in Aug, 78% in October and 46% in December. 84% of posts on trending were published either that day, or the day before.
Just out of interest here is a word cloud based on the 'clip' of the post shown on the trending page
Conclusion
It is very difficult to get a post on trending, we don’t need data to tell us that. But what is interesting is that more than 60% of posts trending trend within 24 hours of posting. The faster a post receives high value votes pushing the payout value up, the easier it is to get a post on trending. This is one of the reason people use voting bots and bid bots.
A trending page works well to showcase posts for people that have not yet signed up to Steemit. However when you log in, I believe this page should change and should be more based on the individual users preferences. That way, everyone’s trending page would be different.
As certain data is not included in the blockchain, such a post view count and trending details, I would really like to get my hands on some data from Steemit Inc. Or maybe Steemit Inc could do a data post. I would like to see information such as view count, vote to view count ratio and other view count ratios, geography information, gender information and general traffic information.
What post on Steemit has received the most external traffic? What author brings the most external traffic. There is so much more we could learn from Steemit Inc if we had the data. So anyone in steemit inc reading this, I would be happy to work with you on the data and present some information to steemit users that is not in the blockchain. If you would like to support this idea, please do resteem this post for extra visibility.
What do you think of the data above? What do you think about the steemit trending page? Please do leave your comments and feeback below.
This post has been set to 50%/50% payment. Any SBD received from this post will be used to purchase STEEM at a lower price.
I am part of a Blockchain Business Intelligence community. We all post under the tag #BlockchainBI. If you have an analysis you would like carried out on Steemit or Blockchain data, please do contact me or any of the#BlockchainBI team and we will do our best to help you...
You can find #BlockchainBI on discord https://discordapp.com/invite/JN7Yv7j
Posted on Utopian.io - Rewarding Open Source Contributors
It was a real surprise for me. I would have expected a "heavier" Power Laws figure, I mean, a bigger area in the head of the curve and a smaller one in the tail. But 48-28 is quite far from a typical 80-20.
100% agree with you; in fact I have a simpler idea: eliminate the Trending tab. Are you a newcomer and need a feed? Build it from scratch, choose some tags from the left side menu, or maybe the front end can put a "seed" showing some random content.
This feature is just reinforcing the Rich-Get-Richer phenomenon here in steemit. Would it be nice to see the effect in the current 48-28 distribution.
My network is not so big but... upvoted and resteemed :)
And finally, I "finally" followed your suggestion from an old post and joined Utopian.io yesterday. I like it a lot. Right now I'm about to sign up on Discord to join the #BlockchainBI channel.
Thanks again Paula! :)
Thank you for the upvote and resteem. Will catch you over on discord during the week so :-)
Thanks for the very detailed information
You are doing a great job with your articles
thank you very much @greatvideos
You are welcome 😉
So the ones on top stay on top.
Great system.
I knew something rubbed me the wrong way, but this lays it out plain as day.
A real dynamic system would have different distributions of people reaching trending, not pareto distribution of a few and the rest languishing in the "tails".
Guess we know which one Steemit is.
In theory – and note the phrasing – it could be that only a relatively small portion of the Steemit posting community writes articles that the majority of the Steemit reading community find valuable enough to upvote, weighted heavily by those with the greatest stake.
So, even with the most generous reading of what Trending actually says, it translates to "really rich folks who have been around for a while like these articles so you should too."
Maybe it's true that the repeatedly top exposed writers are, in fact, the best writers on the platform. Possibly that's true. Maybe.
In practice, that's very close to ludicrous.
Looking at the word cloud provided immediately reveals some of the incestuous nature of the content as presented. Unfortunately, the Categories displayed really doesn't capture exactly how much of the content is monocultural. (I'd really like to see a breakdown which takes into account all the tags on a given post that makes it to Trending, because merely the first one doesn't necessarily reveal anything like the number of high traffic tags it may be pulling on.)
What we do see from the straight Category presentation is that unless you are writing about cryptocurrency or about STEEM and Steemit itself, you can probably look forward to never, ever being on Trending at all. (Unless you get lucky and happen to be interested in homesteading, for some reason. Or hit the lotto with your post and get a bunch of upvotes from follow train or distribution bot, which is something you can never come to count on consistently.)
Ultimately, I think @PaulaG has it absolutely in the right when she talks about the necessity for Trending to take into account the expressed interests of an individual who has come to the platform. That is the missing secret sauce. Alternately, Trending can be the same as it is now, and something more useful based on an individual's expressed preferences can come to be equally promoted.
My breath is currently not being held.
Nicely stated.
My core complaint is you should see new people hit trending, often. The very fact this isn't the case refers to the "incestuous" nature of the back-patting-vote system going on here.
Languishing in the "tails" is okay if you have a shot at the big-time, but not if you're doomed to never get out of that purgatory in the first place.
This is why I think Steemit is largely a self-congratulatory exercise where circles of voters use their piles of tokens to reinforce their standings, at the exclusion of everyone else.
That is the definition of "high barrier to entry" and it will doom Steemit unless something changes.
I also, am not holding my breath.
I'll just watch it happen, and try to support a few minnows on the way down.
I have a very simple suggestion: Eliminate the Trending tab. This is just reinforcing the Rich-Get-Richer phenomenon. Is completely useless, once you have built a feed by following people (at least I don't use it NEVER.)
And let's see what happens with this Pareto figure :)
Is there any way I could warm you up to steemit a little
@talltim ?????
You're a rare jewel in a sea of vote-pandering people, so in a way it is your involvement that keeps me browsing the platform. I'd like it to change, but I have no means to effect that change nor a comprehensive plan to do so.
I leave it in more capable hands, like yourself, to reveal the flaws and potentially suggest paths towards redemption.
Keep doing what you do, you're one of the stellar examples of what Steemit could be if the flaws were addressed.
aweeeee that you @talltim, thank you very much
Good info
Thank you for the contribution. It has been approved.
You can contact us on Discord.
[utopian-moderator]
a very useful post. thanks
I like this for article you.
In depth analysis of the Steemit trading page. Gave some detailed and useful information. I will of course try to use the information to benefit out of Steemit. Thanks
Great work Paula. Good to see a comparison over a long period to see how things change.
There is certainly more that could be done to improve the front end experience both on recommended posts and on individual feeds . I'm certain it will come and could potentially be through a non-steemit development if someone wants to get a jump on the competition.
If I could code I would try myself....yes this would get someone ahead on the competition