& 'charlieshrem' Witness Update

11 months ago

Screen Shot 2016-11-14 at 3.51.18 PM.png

We've been working on something simply amazing the past few weeks, but we wanted to post a witness update in the meantime.

The following is written by @garethnelsonuk, CTO of

Hi everyone, as many of you are aware i'm the developer on and also handle sysadmin duties for the backend there as well as Charlie's witness and seed nodes.

Some of you may have noticed some issues with growing pains on the @charlieshrem witness so we both agreed that it'd be wise to go into detail on some mistakes we've made and how you can avoid them.

Hopefully this will serve as a lesson for other witnesses and as an explanation for others who want to know what happened.

First of all, let's look at the infrastructure we had in place and what we've been moving to.

Old infrastructure and the first incident

The old infrastructure consisted of the following:

  • An SSH jumpbox into the Amazon EC2 VPC with properly paranoid firewall settings
  • A primary seed node hosted at Amazon EC2 in the California region
  • A secondary seed node hosted by EC2 in the EU (Ireland) region
  • A witness node and miner hosted at EC2 without a public IP, connecting to the 2 seed nodes
  • Amazon Route 53 to handle DNS
  • The website hosted at linode on a small 8GB instance

With the exception of some tweaks on the miner (fixing the thread number issue and switching the ECDSA implementation), all steemd instances were stock, compiled from github and ran on top of Ubuntu for maximum compatiblity.

The EC2 instances had 16GB of RAM and 8GB of swap - this was all it was possible to run within budget due to the expense of running on EC2.

In order to keep things stable I configured linux cgroups to prioritise steemd in RAM, swapping out other processes first when the RAM inevitably got full. Sadly this was not enough and the Linux OOM killer struck causing the first downtime incident.

Lesson 1 here is to not put off failover configuration - I had been working so hard on features for that failover testing kept being pushed off and when the OOM killer struck the failover node was not up to date on the blockchain, requiring a painful process of manually switching keys until all nodes were caught up.

On top of this, some of the servers were under so much load trying to swap processes like sshd into RAM that it was extremely difficult to perform administration tasks.

Thankfully the various monitoring systems worked fine and my phone alerted me very quickly to the issues. Since I had appropriate remote access setup I was able to monitor the situation constantly - ConnectBot for android is highly recommended here ;)

Another issue that was less obvious to most was the latency in our API on linode. Part of fixing this involved switching to heavy use of memcached and a websockets proxy that automatically routed requests to the most responsive steemd node - this helped keep the service up but a more fulltime solution was needed.

New infrastructure and the second incident

In order to have a bit of breathing room and in order to control the rising costs at EC2 Charlie purchased a dedicated server at - and quite a beast of a machine too.

This new machine (with 64GB of RAM on a quad-core i7 with hyperthreading) is quite simply awesome and currently serves as the primary witness node as well as other light duties.

In order to make it even faster, steemd actually runs entirely inside a ramdisk that is synched to disk at regular intervals. On top of that, a few other services run inside a ramdisk and the Linux zram module is used to provide a compressed swap device inside RAM (basically trading spare CPU for more RAM and delaying touching the harddrives).

Even if something does touch the harddrives (note the plural - RAID0, mmm), they're SSDs with RAID to provide some high read performance.

Put simply, we moved from struggling to keep things running at all on EC2 and overpaying for it, to paying less and getting a much much nicer machine.

Additionally, there's various processes running on both Linode and this new hetzner box which are used for R&D work for (some cool features coming, watch this space).

Over time we've moved these processes to operate in a more async nonblocking manner - handing off tasks to background threads and caching everything that won't change in the rather generous amount of RAM available. This involved rewriting the WSGI handler for the codebase behind the website to use greenthreads among other changes but was worth it. Static content is also served entirely from RAM and is only read from disk at startup.

What caused issues however was trying to make steemd do the heavy lifting...

Lesson 2 - leave steemd alone to do its thing, don't mess around with it, do your own processing in a seperate process.

In order to implement a new feature on there was a need to locate custom_json operations in the blockchain involving various configuration changes.

Lesson 3 - steemd does not behave like a normal UNIX daemon - do not send it HUP signals, it will die

Because i'm so used to sending a quick HUP to a daemon to reload configuration I thought this would be an easy way to load a new plugin into steemd without having to take it down and allow the nice low latency access enabled by having the external process doing stuff with blockchain data on the same machine as the witness node.

Naturally, I learnt quickly this is flawed. I quickly realised my mistake and restarted steemd only to realise to my horror the fourth lesson:

Lesson 4 - steemd LOVES to replay the blockchain slowly

Even after a clean shutdown, sometimes steemd will simply not start talking to the network and will not do so until you kill it and replay the blockchain.

This would not be an issue were it not for the fact that the failover node over at EC2 was running an older version and therefore could not produce blocks.

Lesson 5 - check your failover configuration

Although I make an effort to perform routine healthchecks on all servers i'm responsible for, one thing I neglected to check was ensuring a failover would actually work - as it happens I checked the failover node was running, but I did not realise it was an older version.

Moving forward - the new plan

To prevent these issues moving forward, the new plan is this:

  1. Perform a snapshot and copy of the primary witness node, boot it up with a different key (to avoid double producing blocks) in a different location
  2. Test the secondary witness node is able to failover properly if required - create a checklist for the health checks and automate as many checks as possible
  3. Perform manual checks of any automated systems - if there's a configuration fault, it should never be left alone
  4. Use only the seed nodes for API access and R&D work

Hopefully all of this will be worth it when we release the new features on, as the software improves future downtime incidents will become less
and less likely.

Help keep SteemPower running! Voting for us as witness pays for the development of apps and tools for Steem.

Vote for us as a witness the following way: click the arrow next to "charlieshrem"

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  trending

Most of that was well above my head but thank you for sharing. You never know what you don't know. I would love to learn more about this whole mining /witness thing and have a lot of learning to do

Do you have a getting started intro page with examples?

ala google maps?


Can you elaborate on what sort of getting started instructions you require?

For using the site itself should be fairly self-explanatory, if you're having trouble using it let myself or Charlie know the exact issue and i'll try to make it more clear.


I'll have to sit down and take a closer look at the site. I'll let you know if I have any questions.

GREAT NEWS! Thanks for sharing these tools with us all. All for one and one for all! Namaste :)

I'm not sure if there are still something wrong in your infrastructure. Anyway here are my humble suggestions:

  • Primary block producing node: run "official" "low_mem_node" compiled in "release" mode on a dedicated (no other duty) VPS or server with 8GB RAM (no swap needed) and good network connection. Remove "account_history" plugin from "config.ini".
  • Secondary/backup node: it's OK to use seed node or node with other duties as a backup. But if need to restart it frequently, better have another node running as 2nd backup.
  • Use different signing keys on each node.
  • The fail-over script can connect to a cli_wallet which connects to public API servers e.g., don't have to connect to your own nodes. It's OK to use your backup nodes if you can make sure they're running when needed.

Hope that you'll have better performance in the future.


Hi there, thanks for the tips.

To be clear, each node does of course use different signing keys and the websocket proxy I wrote actually attempts public API servers if our own servers are down.

With regards to the witness node being low memory: this might in fact make sense but i'm loathe to touch the configuration right now until more hardware becomes available - which should be within a week.

At that point i'll look at swapping the primary witness node to low memory on a dedicated VPS. The hetzner server can then be used more freely for R&D without causing missing blocks.

This post has been linked to from another place on Steem.

Learn more about and upvote to support linkback bot v0.5. Flag this comment if you don't want the bot to continue posting linkbacks for your posts.

Built by @ontofractal

Can I ask that you, as I would like to see witnesses in general do, briefly describe where you stand socially and politically on a few issues of your own choice.

In my opinion, this gives a better understanding of the more personal motivations that could be driving a witness at some point in time.