Initial Witness and Full Node Innovations – @gridcoin.science
It is our goal to run a reliable witness, and then go above and beyond by improving how witnesses (and other Steem nodes) are run.
We believe in sharing our implementations once they are stable enough to run on @gridcoin.science so that all Steem nodes can benefit.
Here's what we've released so far:
We would like to gauge the community's interest on what tutorials they want to see next. Check out what else we've done in the "Innovations" section below and let us know what you'd like us to elaborate on and write more about.
Our greatest achievement so far in operational enhancements to Steem has been reducing the amount of memory needed to run a node (witnesses and full nodes alike). Any Steem node can make use of zram to compress the in-memory database of
Since then, the community began adopting zram as a best practice for Steem nodes. We are pleased to see that @themarkymark has begun rolling out zram on his witness nodes and full nodes.
Our Steem nodes' storage infrastructure is based on ZFS, which offers a lot:
Reduced disk usage: We compress the Steem blockchain to a ratio of about 1.58× (37% space savings) by storing it on an LZ4-compressed ZFS dataset.
One day, the blockchain is going to be 100GiB large, and under our current setup, we expect to be storing all that in only 63GiB. (The compression ratio hasn't been changing much.)
If 100 active Steem nodes use the same compression at that point in time, they'd collectively save about 3700GiB in storage. This is especially meaningful for people running off SSDs because SSD storage is typically expensive.
Quick screw-up recovery: Also thanks to ZFS, we are able to roll back to snapshots of our Steem nodes in case we corrupt our copy of the blockchain or even accidentally delete everything on the node.
We've actually had this kind of disaster scenario twice already. Our backup witness took over, we rolled back the primary witness in a few seconds, then the primary witness caught up. The alternative would have been waiting hours to download the blocks and more hours to replay the blockchain, which would have certainly led us to missing blocks.
As a bonus, thanks to how the blockchain is stored, snapshots hardly take up any additional space at all.
Easy backups: ZFS snapshots also let us back up our Steem nodes with ease. We can stream the datasets (
zfs receive) to whatever backup location we want (currently a nearby NAS over NFS). Snapshots are also incremental, which means only changes need to be sent over to backup storage.
Accelerated performance: The best practice advice is to store the blockchain on an SSD, but we are able to achieve acceptable performance on an HDD because of the ZFS Adjustable Replacement Cache (ARC), which speeds up access to the most frequently and most recently used data on disk.
Data corruption prevention: Yet another benefit of ZFS is how it checksums all data that it stores. Even if one (or both!) of our hard drives silently flip some bits and corrupts anything, ZFS will very likely be able to recover instantly upon reading the mismatched bits.
At just 152 hours into service, the server we're hosting on already had a data corruption scenario where some disk sectors were unreadable. The ZFS mirror (equivalent to RAID 1) did exactly what it was supposed to and corrected the unreadable bits. We replaced the disk, and the data seamlessly "resilvered" onto the new hard drive.
Virtualization lets us take advantage of the technologies outlined above. Our witness nodes run on virtual machines, which in turn are run on dedicated hardware that we control. This lets us connect the software stack to the hardware how we want it.
Aside from facilitating the memory and storage innovations, virtualization also has benefits of its own:
- Isolation: The witness nodes run in their own environments so that they can't interfere with the operational infrastructure below.
- Security: If the nodes are compromised by some unforeseen vulnerability, we can stop it in its tracks, go back in time, and rectify the vulnerability before it's exploited again.
- Overhead: Virtual machines don't have the hardware overhead of dedicated servers, so starting up is much faster.
- Portability: In combination with ZFS snapshots, we've opened the possibility of migrating the witness to another physical server if we ever need to upgrade.
STEEM Price Feed
Many top witnesses use the same STEEM price feed software, which means if they're configured similarly, they can all go down at the same time. This could soon lead to outdated price feeds from major influencers.
This is not a "better" price feed updater. It's just a different one, and the objective is to introduce a bit of heterogeneity and diversity in price feed updates.
Here's what makes
python-steemfeed v0.1.0 different:
- Simple: The script does just one thing: Update your witness's price feed from CoinMarketCap data.
- Uses official STEEM library: The script uses
steem-python, the official Python STEEM library, to interface with the wallet and witness.
- Batteries included: For 64-bit Debian and Ubuntu users, Python 3.6 (required by
steem-python) and all Python dependencies are bundled in the repository. There are also installation instructions for required Python shared libraries (if necessary) and other distros.
Even less disk usage: Our early tests have revealed that the blockchain could have a 1.83× compression ratio (45% space savings) with only a small loss in throughput. We are planning on setting this up with Linux kernel 4.14 or newer and the Zstandard compression algorithm.
If the blockchain were 100GiB large, the current LZ4 compression algorithm would squash that into about 63GiB, but Zstandard could potentially reduce this to 55GiB.
Shared storage: Budget permitting, we'd like to set up distributed/clustered storage on which we'd run our Steem nodes so that physical servers can be taken down while the virtual machines stay up.
Active/active cluster: There could be a way for two or more witnesses to run with the same signing key but not cause a fork. The end result would be that your fastest witness stakes a block and your other witnesses that try to stake a block are gracefully ignored. If implemented, missed blocks could become a thing of the past.
One possibility could be a proxy to the RPC nodes that intercepts the late blocks and rejects them so that the minority witnesses can abort their fork and continue on the accepted chain before they're called upon again to stake a block.
Eliminating blockchain replays: If
steemd's in-memory database is preserved, it can be used to resume a new run of the daemon without replaying the blockchain (
--replay-blockchain). The problem is that since the database is meant to be in memory, it gets erased on server crash or reboot.
We intend to explore ways to persist the data on disk asynchronously so that
steemdcan resume without a blockchain replay, even after an unexpected crash.
If you want to know what specs we've got, here they are:
CPU: Intel® Xeon® Processor D-1521 @ 2.40GHz
Memory: 64GB DDR4-2666 ECC
Disk: 2×480GB SSD and 2×2TB HDD
Operating System: Ubuntu 16.04 LTS
Steem Witness Virtual Machine
CPU: 4 logical cores of Intel Xeon D-1521
Memory: 16GiB plus 16GiB zram
Disk: 400GiB ZFS volume
Operating System: Ubuntu 16.04 LTS
@jerrybanfield has identified @gridcoin.science as a low-ranked witness that qualifies for a generous donation.
We are publishing this initial update to show what has already been implemented and what is in the works. @jerrybanfield's donation would help us develop better ways to operate Steem and Graphene nodes that we would then bestow back on the community.
The witness @gridcoin.science is at the forefront of all the improvements outlined in this article. To support this witness, visit https://steemit.com/~witnesses and add gridcoin.science to the box at the bottom of the page, click vote, and authorize using your Active Key.
We want to continue innovating and sharing our discoveries. Please let me or @dutch know what other topics you'd like us to explore.