Using Intel Optane for STEEM blockchain seed node

in #steemdev6 years ago (edited)

Introduction

Blockchains often stores state, consensus information and other meta data in a single file. Recently there are mechanisms developed to store state data off chain. But the need to read, process and reindex "blockchain" is a major challenge in starting new nodes as it consumes hours. To add to it, most of the blockchains doesn't use multi threading making it even more slower.

To avoid I/O bottle necks blockchains depends on shared memory and thus large amounts of RAM. Conventionally the Proof of Stake (PoS) algorithms thus have high Space Complexity aka memory requirements. Graphene and DPoS consensus based blockchains like STEEM also addresses the I/O issue by making use of large amounts of memory. In the recent past STEEM has reduced the memory requirements by implementing efficient snapshot mechanisms. Other chains with similar challenges will Bitshares, SCORUM and may be EOS.

In this scenario, I got a chance to evaluate the performance of Intel's new Octane NVMe Flash technology to replace RAM.

Problem Statement

As briefly mentioned in the introduction, the requirement for large amounts of RAM is not just a cost barrier but in some cases like running full RPC nodes the availability of hardware makes it difficult to get started. The recent developments in flash technology from Intel and the introduction of Octane drives gives us a cheaper, more accessible option to run memory intensive workloads.

In this specific case, I wanted to test whether its possible to run a STEEM full node using Optane NVMe instead of RAM. I was lucky enough to get hands on a reasonable hardware with Intel 900P Octane drive on it.

Testing

My first attempt to run a full node failed to unknown reasons. I got help from @jesta on CPU types& general guidelines to set things up. The test is not a very scientific one and I had some additional services like SNMP running on it. To make matters worse, I have a HDD on which the OS is installed.

Environment

A dedicated server with 1 G/s network connectivity and minimal firewall settings. Default system services that comes with Ubuntu 16.04 LTS left untouched.


OS: Ubuntu 16.04

Kernel Version: 4.15.0-38-generic

CPU: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz

Optane Drive : INTEL SSDPED1D480GA ( Intel Optane 900P 480GB )


Number of Open files in the system at the start of the test:

[root@STEEM-Full-NODE:/chain]# lsof | wc -l
3054
[root@STEEM-Full-NODE:/chain]#


The test was done with minimal seed node configuration with only witness and rc_plugin enabled. The blockchain file was kept under /chain where the Optane drive is mounted. The shared memory file also was kept under /chain.

Minimal tests and results

@drakos was kind enough to guide me and provide me with few handy (new ) commands and useful options to others. Appreciate the help :)

At one point I was doubtful whether this is really an Optane and @drakos suggested lsblk which is new to me.

[root@STEEM-Full-NODE:/chain/0]# lsblk -io KNAME,TYPE,SIZE,MODEL
KNAME TYPE SIZE MODEL
sda disk X.XT ATXA8123NM0033
sda1 part 953M
sda2 part X.XT
sda3 part 3.8G
nvme0n1 disk 447.1G INTEL SSDPED1D480GA


hdparm with the direct option

From the man(ual) page:

             Use the kernel´s "O_DIRECT" flag when performing a -t timing test.  This bypasses the page cache, causing the reads to go directly from the drive into hdparm's buffers, using so-called "raw" I/O.  In many cases, this can produce results that appear much faster than the usual page cache method, giving a better indication of raw  device  and  drive performance.


hdparm -tT --direct /dev/nvme0n1

[root@STEEM-Full-NODE:~]#sync ; echo 3 > /proc/sys/vm/drop_caches ; lsof | grep /chain ;
[root@STEEM-Full-NODE:~]#hdparm -tT --direct /dev/nvme0n1

/dev/nvme0n1:
Timing O_DIRECT cached reads: 4848 MB in 2.00 seconds = 2425.03 MB/sec
Timing O_DIRECT disk reads: 7258 MB in 3.00 seconds = 2418.92 MB/sec
[root@STEEM-Full-NODE:~]#:


[root@STEEM-Full-NODE:~]#hdparm -tT /dev/nvme0n1

/dev/nvme0n1:
Timing cached reads: 19652 MB in 1.99 seconds = 9855.17 MB/sec
Timing buffered disk reads: 7680 MB in 3.00 seconds = 2559.88 MB/sec

These are advertised values.


The memory values looked something like this little while after the replay.

image.pngfree -h

I wanted to tweak swapiness and cache pressure but the sysctl parameters were kept at the values suggested by STEEM build instructions.

[root@STEEM-Full-NODE:/chain/0]# optimize() { echo 75 | sudo tee /proc/sys/vm/dirty_background_ratio; echo 1000 | sudo tee /proc/sys/vm/dirty_expire_centisecs; echo 80 | sudo tee /proc/sys/vm/dirty_ratio; echo 30000 | sudo tee /proc/sys/vm/dirty_writeback_centisecs; }
[root@STEEM-Full-NODE:/chain/0]#optimize

Steemd build options and disk layout

Steem was buld with low memory options

cmake -DLOW_MEMORY_NODE=ON -DCLEAR_VOTES=ON -DSTEEM_STATIC_BUILD=ON -DSKIP_BY_TX_ID=O -DCMAKE_BUILD_TYPE=Release SOURCE

Everything related to steem in /chain

/dev/nvme0n1 440G 201G 216G 49% /chain

Summary

The replay completed in less than 7 hours & is considerably faster than SSD.

Load conditions are normal and no swap was used.

image.png

This write up can be considered as a starting point/tutorial for anyone who wants play with Optane drives. Additional testing is needed to make sure that this setup works in witness mode. I hope to get access to devices with the right hardware and perhaps a faster CPU to conduct further tests.

Keywords : Setting up STEEM seed on Intel Optane, Intel Optane 900P, Optane and blockchain

Sort:  

Excellent results. Sent you a dm.

Thanks

This post has been rewarded with 100% upvote from @indiaunited-bot community account. We are happy to have you as one of the valuable member of the community.

If you would like to delegate to @IndiaUnited you can do so by clicking on the following links: 5SP, 10SP, 15SP, 20SP 25SP, 50SP, 100SP, 250SP. Be sure to leave at least 50SP undelegated on your account.

Please contribute to the community by upvoting this comment and posts made by @indiaunited.

I have heard about Intel Optane a lot also I tested the Boot Time of a system, its really fast. The test was good and nicely documented, but considering 7 hours is still a lot of time (though a very less compared to the SSDs), what are the options we still have to reduce it further.

I would like to see a witness-server running on Optane Memory, to see how it goes. If the tests are successful, then I guess every blockchain project should use Optane, what you think?

I am aware of @anyx ‘s setup and the read performance from optane , ie rocksdb is on optane is what made me explore this in detail. I have doubts whether we should be keeping the rocksdb on Optane or not. May be we should keep block_log and index on optane and rocksdb in SSD.

Yea, Optane is really fast!

I would like to see a witness-server running on Optane Memory, to see how it goes. If the tests are successful, then I guess every blockchain project should use Optane, what you think?

IMHO we need to do extensive tests before deciding. For example for the full node we need a lot more testing as the RocksDB performance looks little weird to me as per the results here : http://www.lmdb.tech/bench/optanessd/ So in the case of witness nodes also, there may be surprises that we have to test extensively. But I feel if we test for STEEM, it can be generalized for other Graphene based blockchains too.

I have requested for access from Packet + Intel & if we can get some hardware, extensive tests can be done. I think we will have to do

  1. Tests with witness and run extensive tests
  2. Tests in the fullRPC node and benchmark the RocksDB storage (engine)

It would be interesting to test again with two Intel Optane one for the block_log (read) and one for the memory state file (write) and see how that affects replay speed.

Memory state = shm file right ?

I tried with memory state and everything else on the same optane. Now I am converting the server to my secondary server and we can continue the testing once the Intel hardware is available. They also have a mechanism where we can treat optane like RAM and I think that’s going to be the right use case for Blockchains like STEEM

Excellent work. You should have tried @utopian-io tag and mentioned that you are a witness.

@vimukthi - Intel and Packet has a program to help projects test Optane on a powerful hardware. I have applied for it here. What are your thoughts / suggestions on making this better ?

https://github.com/AccelerateWithOptane/lab/issues/16

I don't really have much of technical knowledge give much feedback. I read a little bit about Optane and I'm convinced that it'll be very valuable in the long run especially for blockchain products more than anything. Make a post and share it around or just try Discord for feedback from those who know the technical side better.

Sure, will do. Right now there are no other blockchains which tested Optane - so that will give STEEM some value. From what I understand EOS is using MongoDB for snapshots (not sure) and unless there is RocksDB engine in there, STEEM's approach IMHO is much faster and scalable.

STEEM also has the advantage of being able to focus rather than generalize. EOS will have to think about too many variables. Then there is Hashgraph (The only aBFT crypto I know of) which is already valued at 6 Billion USD. Thanks to SEC only accredited investors could get in. Hashgraph could seriously out compete EOS.

STEEM has the advantage of having multiple social media platforms accessed with a single account. This is one reason I've sold all my EOS and STEEM is my biggest crypto investment. The ecosystem is kind of like Apple. It's really hard to get out of.

And we got access to the Intel program !

Thank you. I did few mistakes in this test. I am doing this again and will update again. I have a hard disk (not SSD) where the steemd executable is living plus dependencies. I want to move them also optane to make better performance.

And here is the next test: this will break optane ;-)

It is cool that you are doing this :)

In the post it says In this specific case, I wanted to test whether its possible to run a STEEM full node using Optane NVMe instead of RAM, but it also says The test was done with minimal seed node configuration with only witness and rc_plugin enabled. What was the configuration used that resulted in a 7 hour replay time?

Also, how much RAM was on the system?

:-)

I wanted to test whether its possible to run a STEEM full node using Optane NVMe instead of RAM, but it also says The test was done with minimal seed node configuration with only witness and rc_plugin enabled

So I started off with the plans of running full node. But due to in-sufficient space, I couldn't complete the test and then decided to start with seed first and then witness and then full node. So far test of the full node is not done.

What was the configuration used that resulted in a 7 hour replay time?

Will share this - essentially shm file also was on Optane

Also, how much RAM was on the system?

256 GB

With the official hardware from Intel and Packet, IMHO we should limit the available RAM and do the test.

To listen to the audio version of this article click on the play image.

Brought to you by @tts. If you find it useful please consider upvoting this reply.

Congratulations @bobinson! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

You made more than 14000 upvotes. Your next target is to reach 15000 upvotes.

Click here to view your Board of Honor
If you no longer want to receive notifications, reply to this comment with the word STOP

Do not miss the last post from @steemitboard:

SteemFest3 and SteemitBoard - Meet the Steemians Contest

Support SteemitBoard's project! Vote for its witness and get one more award!

Good stuff! I have been wondering this for a while when I first heard of Optane thru one of LinusTechTip videos on YouTube.\n\nSure, Optane compatible serves maybe expensive as hell now to set up now.. But we all know that the prices usually go down as it becomes more mainstream.\n\nSo happy to see the positive results from your test. I'd like to know more down the line if you're considering full seed node and how Optane handles that.\n

Even though the prices are high, they have a technology to map optane like real RAM. That can help to reduce the over all cost. I am hoping to test it real soon.

Yes Yes!! Please let us know the results. I'm so waiting for this..

Coin Marketplace

STEEM 0.19
TRX 0.15
JST 0.029
BTC 63050.55
ETH 2622.66
USDT 1.00
SBD 2.71