SteemReports - The Anatomy of Steem and Steemit

in #steemit7 years ago

Usually @steemreports just tries to understand and explain the data stored in the blockchain, and make it useful to people who are aren't data analysts. Recently however, I've received a few questions that have led me to investigate how this system works, and here I'm attempting to explain it a little bit.


stdiag.png

When somebody uses their PC, Tablet or mobile phone to visit steemit.com, their browser asks for the web pages to be sent across the Internet so they can be displayed. Because there are now 300,000 accounts, something called a Load Balancer shares the large number of page requests that are received every second between several Steemit Web Servers so none of them are overloaded. More of these web servers can be added as steemit.com grows.

The steemit.com web server does not hold the articles and comments though, the web server itself asks for this information from any of the several RPC nodes which are powerful computers that, together with the dozens of Witness Nodes each hold a copy of all the data in the blockchain. Currently there is around 22GB of data in the blockchain, and this grows every time an article or comment is posted, a vote cast, or a currency transfer made. Of course, with the growth of information and users, the RPC and Witness network also needs to grow over time.

Running these computers can be expensive, which is why as well as paying authors and curators, 15% of the reward pool is used to pay the people who install and run these computers.

Anyway, as well as sending the article text back to the user's browser, the Steemit Web Server also sends back some information about how many views an article has had, and allows unread articles and comments to be shown in bold for your convenience. This 'meta-data' is collected in the Steemit Database, and this is the only part of the system which is private, and is why neither @steemreports, nor any other third-party service, can present charts and graphs about how many views your content has received. Though it would be great to, this would require special permission and access to the Steemit database, which is the property of Steemit Inc.

The diagram above also illustrates the cases where people use other 'front-end' services such as busy.org (and in fact steemreports.com) to view the blockchain data. We have separate web servers, which also interact with the RPC nodes.

Apps such as eSteem may access the RPC nodes more directly, without going though a web server. Their more specialised software makes this possible.


The diagram is slightly simplified, but I hope this helps you to understand a little more about how the Steem platform works. Feel free to ask any questions below, and I'll do my best to answer them where I can.

Disclaimer: If anyone with more knowledge has any corrections to make to what I've presented here, please let me know. I'm also on an educational journey, and I haven't actually used busy.org or eSteem yet!


Please vote, resteem and follow us to for more reports and services, and visit our website:
http://www.steemreports.com

Sort:  

The biggest problem I see is that the incentive structure is wrongly set up. Running a RPC node is MUCH more expensive than a witness node. Yet there is no direct compensation for doing so. The compensation for running a simple witness are probably about right - depending on how "decentralized" one wants the network to be.

But the heavy weightligting is being done by the RPC nodes, for which no reward is designed in the blockchain.

I agree, but can't think of an effective way to reward people running RPC nodes. The majority of load probably comes from reads and not writes, and therefore has no impact on the chain, so isn't possible to do automatic reward calculation.

you can take a statistic approach, using dedicated transactions that reward RPC nodes once a week, say, on the basis of some data analytics: sum up all the transactions in the week, then divide in slices equivalent to the share served by each RPC node and allocate a part of the reward pool in proportion.

BTW, I re-used your drawing (with attribution) in my latest witness update, I hope you don't mind : https://steemit.com/witness-category/@lux-witness/second-week-full-steem-ahead

There are 2 problems I can imagine with that:

  1. I don't think it's currently possible to tell which RPCs relayed each transaction, and adding an extra signature to all transactions would create a lot of extra data.
  2. Under this proposal, abusive parties could submit extra transactions though their RPC to artificially inflate their share of the RPC rewards.
  1. would indeed mean a serious upgrade - but this would anyway be major upgrade. I would be surprised though if there wasn't something already because otherwise 2. would already be possible today if the RPC were "transparent" for the blockchain. Not for rewards but a malicious RPC node could try to play all sorts of mischief if the witnesses were not able to tell what RPC node was responsible for what interaction ...
  2. Not sure, see above. Anyway it would have to be well discussed and analyzed, every detail will matter in the implementation. I think the risk of the status quo (no rewards for running an RPC) is bigger than the risk of a botched implementation of rewards for RPC

Great explanatory post. Thank you.

Great post. Thank you!

@OriginalWorks Mention Bot activated by @guttormf. The @OriginalWorks bot has determined this post by @steemreports to be original material and upvoted it!

To call @OriginalWorks, simply reply to any post with @OriginalWorks in your message!

For more information, Click Here!

tip!

Hi @steemreports! You have just received a 0.1 SBD tip from @emble!

@tipU - send tips by writing tip! in the comment and get share in service profit :)
By upvoting this comment you support the service - thanks!

Can i run a parallel database? When somtings go wrong whith original?

In order to have your own copy of the blockchain data you may (if you have sufficient technical knowledge) set up your own RPC node. It will likely cost more than $200USD per month though, and this cost isn't exactly reimbursed. Only when Witnesses 'produce blocks' are they paid for their work.

The approach @steemreports takes is to download some of the data to our server, but not the full blockchain, and also use the public RPC nodes where needed.

This post has received a 9.82 % upvote from @booster thanks to: @andybets.

Of the 300,000 accounts many are inactive.
Phones and tablets use dynamic pages so only small potions of data need to be transfered to the device.
That's why we sometimes only see a half portion of the page. Until the next data packet arrives.
It would be interesting to see how much data is being sent/received under the present usage rates?

This is true, but quite a few are also unfortunately hyperactive - the bots!

It would be certainly be interesting to know those traffic levels, but I think only Steemit Inc. could provide the data, as they control most of the RPC servers.

Interesting idea to think about. Most of the bots that I see seem to be doing something useful apart from Meep. I don't know what the point of that is? Any ideas?

There are quite a few leaving stupid one word comments, I think it mainly affects higher value accounts. It seems that some people find it 'cute' and give it votes - not sure why though, I just find it annoying!

Me too. Just wasting bandwidth and clogging up the comments. Makes the place look untidy. :) Here's one that I find actually very handy tip!

Great post

Coin Marketplace

STEEM 0.19
TRX 0.13
JST 0.030
BTC 61250.32
ETH 3377.05
USDT 1.00
SBD 2.48