Exploring Steem Scalability

in #steem6 years ago (edited)

In this post, we will address some of the concerns that have been raised regarding the increasing RAM usage of steemd nodes, as well as our future scaling plans. While the challenges associated with scaling are not something we will ever take lightly, we also think that many of the concerns have been raised due to some misunderstandings about how to properly/optimally operate steemd nodes. We will provide some guidance on this in the sections below, and we will also talk about several changes that we have in the pipeline for addressing our future projected growth.

What is Scalability?

The Steem community is rapidly growing, and with it, so is the Steem blockchain. Growth is great, but it brings with it scaling challenges. Other projects (such as Bitcoin and Ethereum) have been stuck at a standstill with their scaling problems for years - unable to adopt any significant changes to meet the growing demands that increased usage has placed on their blockchains. Steem on the other hand has continued to rapidly evolve and is meeting these challenges head on, thereby enabling it to process more transactions than every other blockchain combined. In other words, the majority of blockchain transactions occurring globally are being done on Steem.

We’ve been able to do this because our team is made up of an ever-growing roster of the most talented and innovative blockchain engineers on the planet. This doesn’t make us cocky; it makes us acutely aware of the scaling challenges in front of us, and we want to assure you that we are adequately prepared to deal with them. While we are confident in our strategy, we are also eager to hear your thoughts, objections, and insights in the comments.

A Brief History of Scaling

The most critical decision with respect to scaling is where you start. The more scalable the foundation, the more scalable the stack. A stack’s ability to scale tends to have, at best, an exponential relationship to the starting point. It is incredibly rare for an architecture to go from being able to support 3,000 people to 3,000,000 people overnight. Instead, it goes from 3 to 6 to 12, etc. Starting from an architecture that was already far ahead of the pack in terms of scalability (Graphene) was a critical component of the scaling strategy. Those that failed to make similar decisions now find themselves in the difficult position of having to rebuild their foundation without damaging the entire ecosystem that was built on top of it.

ChainBase and AppBase

The first major scalability-related upgrade was the replacement of Graphene with ChainBase. Thanks to its faster load and exit times, and increased robustness against crashes, ChainBase was critical to enabling Steem to process its current volume of transactions.

The next major improvement that is nearing completion (thanks to the hard work of @vandeberg and the blockchain team) is AppBase, which further improves Steem’s overall scalability through modularization. AppBase will allow many components of the Steem blockchain to run independently, which will permit steemd to take better advantage of the multithreaded nature of computers, and even enable different components of the blockchain to be run on different servers - reducing the need to run the Steem blockchain on individual “high powered, high cost” servers.

Optimizing Steemd Nodes: Block Log + State File

With respect to operating a steemd node currently, it is critical to understand that Steem requires two data stores: the block log and the state file. The block log is the blockchain itself, written to disk. It is accessed infrequently, but is critical to verifying the integrity of new blocks and reindexing the state file if needed.

The state file contains the current state of Steem objects, such as account balances, posts, and votes. It is backed by disk, but accessed via a technique called memory-mapped files. This technique was introduced in December 2016 with the release of ChainBase.

Everything RAM?

Many node operators are suggesting that servers should have enough RAM to hold the entire Steem state file, due to the fact that Steem's performance drops when the operating system begins “paging” Steem's memory, which is a common memory management technique. We want to be very clear that it is not required to run a steemd node in this way. This is certainly a valid technique for increasing the performance of reindexing the node and servicing API calls, but is only useful in a limited number of cases. In the majority of cases (including with witness, seed, and exchange nodes), it is sufficient to store the shared memory file on a fast SSD or NVMe drive, instead of in RAM.

Witness and Seed Node RAM Requirements

When running a steemd node with only the witness plugin enabled (the common configuration for witness and seed nodes), Steemit recommends 16 GB of RAM, although 8 GB is likely sufficient if your node does not need to reindex often. If the shared memory file is stored in /dev/shm/, then additional RAM would be needed to hold the entire state file, but this is not a recommended configuration. To avoid the need for extra RAM, the shared memory file can be stored directly on a fast SSD or NVMe drive.

A server with 8-16 GB of RAM will be slow with reindex, but it will function properly as a seed/witness node once it is up to date with the latest block. Running on a 32 GB server is ideal for optimal replay times, but it is not a requirement for a witness/seed node to properly operate.

Shared Memory File Size

The default configuration for a steemd node stores the shared memory file in the data/blockchain directory. As long as this location is on a fast enough (SSD or NVMe) drive with sufficient space, then the default setting should work.

The current recommendation is to have at least 150 GB of fast SSD storage, which includes the block_log (currently around 90 GB) and shared_memory.bin (currently around 33 GB). These amounts will increase over time.

Whenever the size of the shared memory file has increased beyond the size that is configured in the config.ini file, it has been necessary to update the configuration to a larger size and restart the node. There will be a change included in the next release (Steem 19.4) that will automatically increase this limit as needed, without the need to restart the node. This will be able to be configured and turned off entirely if you want to keep your state file in /dev/shm.

“Full Node” Requirements

Nodes that are running additional API plugins (especially account history) will require more RAM to support a larger state file. A “full node” (one that is running all of the plugins) can technically run on a 64 GB server, but it will be extremely slow to reindex, and it will be slow at serving API calls because the operating system paging algorithm does not handle memory-mapped files very well. A node with 64-256 GB RAM and a fast SSD/NVMe drive may be adequate for many use cases, depending on the load.

Increasing Performance on High Use Nodes

For more heavily used nodes, the best way (currently) to increase the performance is to have enough RAM to hold the entire database. This skips the need for paging altogether, which technically defeats the purpose of having a memory mapped file. For a node running all of the plugins except account history, this currently requires 256 GB RAM on a pre-AppBase node.

A technique that we have been using to lower the memory requirements on a “full node” (one with everything including account history), is to split the API node into two servers. One server runs only “account history,” and the other server runs everything else. This allows both servers to use less than 256 GB RAM, instead of running everything on a 512 GB RAM server. We strongly recommend running account history on a dedicated node if you want a complete history for all accounts, since it eliminates the need to have a single 512 GB RAM server.

Optimizing the use case of a “full node” is a top priority of ours, and one that we will talk about more in the next section. If you only need history for certain accounts though, or only care about certain operations, the hardware requirements may already be significantly reduced.

Future Scaling Plans

We are currently working on several projects that will reduce the memory requirements of “full nodes” by moving much of the API logic into non-consensus plugins such as HiveMind and SBDS. This will allow a lot of the functionality to be run off of SSD storage, rather than in RAM - which will lower the operating costs. By offloading data to hivemind/sbds and/or RocksDB (below), we should be able to reduce the requirements for a full node down to the same requirements for a consensus/seed node, which is an important goal of ours.

RocksDB

In addition to the non-consensus plugins, we have begun research on using alternative data stores and moving away from Chainbase. One such data store that has shown promise is RocksDB.

RocksDB is a fast-on-disk data store with an advanced caching layer, which could further minimize latency when reading/writing to and from the disk as it is optimized for fast, low-latency storage. Used in production systems at multiple web-scale entreprises (Facebook, Yahoo, LinkedIn), RocksDB is based on LevelDB but with increased performance thanks to its ability to exploit multiple CPU cores and SSD storage for input/output bound workloads. Its use in MyRocks, for example, lead to less SSD storage use, longer SSD endurance, and more available IO capacity for handling queries.

Further Modularization

We are also working to modularize the blockchain beyond even what was originally planned for the initial AppBase implementation, for example, by having separate services that can be run on different servers. This will allow processes to be further spread across many small servers, increasing flexibility and decreasing cost.

Conclusion

As blockchain projects continue to become more mainstream, scalability is going to become more and more of a concern. Being a scalable blockchain is not just about being able to make a one-time fix to meet the current resource challenges. It is about being prepared to meet the future challenges as well.

Steem has already proven itself as the fastest and most heavily transacted public blockchain in existence, and scalability continues to remain a top focus of ours. We know that scaling challenges will never completely go away, which is why we plan to continue innovating to ensure that whatever growth comes our way - we'll be ready.

Team Steemit

P.S. Don't forget to share your thoughts, objections, and insights in the comments!

Sort:  

Can you guys please STOP CREATING NEW ACCOUNTS until you can properly mitigate the exploiters that are creating huge botnets and siphoning rewards from the pool.

You want to talk about scaling and resources? Most of the resource problems we have around here are directly related to exploits/spammers...problems that lie directly at the feet of STINC, via recent the last few hard forks and account creation.

Correct the bad decisions you’ve made and are making that have caused/are causing most of the current problems.

This problem was already brought up to light recently, I believe that blocking new accounts creation is a poor decision, it would just throw out the baby with the bathwater.
Those who are in charge of accounts approval are making a terrible job allowing creating entire botnets and not letting real users in. They should be relieved from that task. Maybe there should be a new mechanism of approval.
Maybe it's better to make a new user create a post (like, obligatory introduction post, for instance) instead of making them wait months for approval. Then something like committee from known and established users would (cu)rate it, and make the decision if the user should be approved. It's a win-win for both parties.
One of the STEEM foundations is Proof of Brain, right? That's how it's done, no spammer can automate the process and those who can't put two words together will also be kept out. It's not like I'm hating all those Indonesian guys, writing about how happy are they to be on steemit with poor grammar and everything else, but they are not bringing any value anyway. There will probably be need for multiple groups for handling multiple languages, but that's another topic altogether.

“I believe that blocking new accounts creation is a poor decision...”

I’m not saying all account creation needs to be stopped. I’m only asking that those created via STINC’s current process be halted. There is an obvious, major malfunction with their system that needs to be addressed.

This is akin to what governments typically do. They undertake certain tasks/functions, they perform these tasks horribly, resulting in many undesired but predictable consequences, then they propose “solutions” for the very problems that they have in fact created. And all the while, everyone else (in this case users, potential users, and investors) ends up paying the price for their inefficiency and complete ineptitude.

Accounts can still be created. But it’s plainly obvious that whatever methods STINC is using are inadequate and are contributing to the creation of large bot-nets that strain the current resources and siphon rewards from the shared pool. The solution is to shut down the failing system in place until it can be corrected so that further damage is not done.

Compounding the errors/damage when it can be easily avoided is irresponsible. It’s time that STINC acknowledges their role in the mess that they have created...and then they ought to FIX IT.

Agree. Centralized account creation is a bloody hell right now. As all centralized solutions.

Yea, agreed too

Yep, as someone that has been downvoted to negatives by on abusive account holder: @haejin

I would say there is a massive loophole that needs to be fixed before @steemitblog can be taken seriously

Yeah, I can see how competing merchants or businesses would get into silly downvoting wars and whoever has the more goons wins.

And then there are the trolls...

until you can properly mitigate the exploiters

This is like asking a fat person to mitigate other fat peoples food intake ... or something.
¯|(ツ)

via recent hard forks

Not so recent, the last one was about a year ago, but I know what you mean.

Yeah, that’s correct. I edited it.

And it should also be noted that they claimed to have HF20 almost ready to go when HF19 was implemented in early June last year. Ten months later...

For reference:

https://steemit.com/steemit/@steemitblog/proposing-hardfork-0-20-0-velocity

nah, first will be 19.4 ...appbase?
Or will they do both, HF20+appbase?
Because, well, why not introduce 2 error factors

Only two would be an improvement. We’re used to a minimum of five at once around here.

Maybe it needs to be an even number? :)

I thought STINC was using a Fibonacci sequence.

Thanks for the feedback. We understand this is an important topic for many users and will be addressing non-scalability related issues in future posts.

Great. Looking forward to it.

But for the record: This is a scalability-related issue.

It's like when you have a design for a bucket.

Don't make a damn if the bucket is 5 gallon or 10 gallon when it is riddled with holes.

Better to worry about patching the holes before increasing the capacity of the bucket.

We are losing users and rewards to scammy bot masters.

Hope they get their priorities straight. I really want this platform to succeed.

So it is more important to make the system more streamline for the bots? Isn't that like removing part of your brain to make room for the tumor? I really don't understand this platform works.

Thanks for sharing! ;)

I don't personally know much about any of these issues, yet, but the comment here by @ats-david certainly seems like it's worthy of a reply at least. Yes?

Can we please understand each other we have different views and reasons.

I am new here, but I seen a report somewhere that like 90% of the users are Bots, so he has more than a point if that is true, it seems they have co-opted this platform to enrich themselves.

I am also new. I thought an upvote drained power from your computer? So, if that is correct, then just don't upvote a bot, but assuming your report is true, then I'm probably wrong and it's some other technical mumbo jumbo reason. Or maybe astroturfing from competitors? lol, like facebook or something, or even the CIA who rely on corporations to collect private data.

BOOGA-BOOGA-BOOGA!!

I guess it could be a million things besides greed. what report? Link would be nice. I assume they are allowed here. I wont believe you in case you are a bot ;P

wtf-am-i-rrading.jpg

couldn't agree more sir

Nice comment

It's called load testing ;)

bad/greedy guys will be bad/greedy and you asking wont stop them lol its kinda what they do.

criminals break laws it dose not mean we avoid arresting them, something should be done about the abuse of the upvote system

Give me vote and promote me plz

Hey team, the way you handle so many daily transactions leaves me to think you definitely know what you are doing with scaling. Keep up the good work. All we need is communities or some kind of organizing on this site and it can be a top 100 trafficked site no problem.

There are many things beyond this post that should of been covered, sure steem handles an impressive amount of transactions, does that make it the best blockchain?

we need accountability, on this platform much so with the rampant amount of bots running loose.

Absolutely Agreed !

Agreed.

Congrats, you made the @dtube #steemitminute for today!

Click the Image Below


Where do I get one of those fancy Dtube shirts?

FWIW, /dev/shm does not imply physical RAM. It is backed by virtual memory which can include swap. A properly configured system with the state file in /dev/shm will require less physical RAM and/or perform better with the same amount of RAM than one using a disk file (at least on Linux; I can't comment on other OSs). The trade-off is lack of persistence of the file across reboots (unless it is explicitly copied).

Steem full node is essentially a data-mart with a task to low-latency response to a fixed set of queries. I believe the industry-best solution to this problem for now is a RDBMS cluster with a Redis cache.
Why don't you use it?

Thank you for this detailed explanation of everything, as a non-techy I found I was able to grasp it much better and I'm as determined as ever that Steem has an amazing team behind it focusing on the important aspects which is the scalability of the blockchain. We are already so far ahead of many other blockchains in terms of real world use and promising apps being built on top of it.

Resteemed.

Glad you think so! Thanks!

Great work, Andrew! I'm so excited to see this regular, professional communication. I know how much work goes into creating something like this, and I really, really appreciate it. It's a huge investment of valuable time, but this communication is so important (IMO).

Thanks Luke, we’re getting better at it. Just another scaling challenge ;)

I do not understand technology. So I follow your opinion. My friends in Indonesia say you are a caring Steemian.

I have a renewed sense of confidence. For a while I've been very concerned about scalability and had some doubts about the future of Steem. Thanks for the node and RAM suggestions, I hope the new AppBase/RockDB implementations will slingshot Steem to the next level to accommodate the ever increasing user base and transactions.

Thanks to all the dev team for their relentless hard work.

I like the beards master.

the-master-and-the-student.jpg

Thank you for sharing your concerns with us!

This is a great explanation, especially for those a bit newer to Steem/Steemit. I have a couple of questions that the devs may be able to help with.

What is the current size of the state file of a full node minus the account history? and what about for a node running only the account history plugin?

In the current Steemit backend, is the Jussi (reverse proxy server) accepting requests, and redirecting them to RPC nodes running different plugins if they're not in the cache, or is it using the SBDS database for some things already?

Hi @andybets - thanks for your question. Currently a node with only account history uses 204GB for the state file while a node with everything except for account history uses 190GB. Today, Jussi (api.steemit.com) is being used to split requests to the different clusters of nodes if they are not already cached. In the future, many of these requests will be split off to SBDS (for account history) and Hivemind (for tags/follows). RocksDB for account history (and later probably tags/follows) will be an alternate option. Either configuration can be forwarded through Jussi without the frontend (Condenser) needing to be aware of the change.

Thanks very much - that's really useful!

You rock, Justin. I hope you're not working too hard and are positioning yourself to get that vacation we talked about in Lisbon. Thanks for all the work you and the team are putting in and the results we're all enjoying, being part of the most performant blockchain on the planet.

Thanks Luke!

Bitcoin has been here for 9 years, the core could't figure it out how to scale well. Operating cost for Bitcoin blockchain is insane, energy costs, forks, clampdowns, you would think some other chain would take a lead by now.

Steem is two years young, it is working great, we are the busiest blockchain alive, more lives have been changed for the better, anyone who has seen the potential here from the early days is quite happy by now. This is just a beginning. I'm glad STINC is thinking ahead, this just reinforces my hopes for the future. Steem on!

Thanks so much for your comment - do you have to call us 'STINC' though? Just sayin' 🙂

Oh, I thought that was just a short version for SteemitINC, not saying that you stink or something :)

STeemit INC...STINC

Seems like an appropriate abbreviation to me. It beats spelling out “Steemit Inc” every time.

How about SINC ? (shorter to type lol) 🙄

Sounds and looks better.

Yes STEEM is a great platform. It is a paradigm shift. Nothing happens overnight. And the good thing is that if we don't like something about it we can fix it. What if we all focused on fixing the issues along the way?

I would rather have a post on exploring steem development...

Steem team is amazing!

Coin Marketplace

STEEM 0.29
TRX 0.12
JST 0.033
BTC 62559.43
ETH 3092.10
USDT 1.00
SBD 3.86