How to convert the entire Steem blockchain into JSON format using a python script

in #dawn-network7 years ago (edited)

It


was a bit of a mission to find out the right way to do this, but as I wrote previously, I am going through a couple of processes in order to be able to convert the steem block_log into JSON format (the raw format provided by full steemd RPC node), and from there it is simple to import it to any other database format.

Without any further ado, here is the python script, it depends on piston-lib's module pistonapi from @xeroc.

#!/usr/bin/python3
import os
import piston
from pprint import pprint
from pistonapi import SteemNodeRPC

rpc = SteemNodeRPC("ws://127.0.0.1:8090", "", "")

props = rpc.get_dynamic_global_properties ( )
current_block = 0
last_confirmed_block = props [ 'last_irreversible_block_num' ]

while current_block < last_confirmed_block:
  current_block += 1
  os.write ( 2, str.encode( "Block #" + str ( current_block ) + '\r' ) )
  block = rpc.get_block(current_block)
  pprint(block)

This puts out the raw JSON output to stdout and prints a nice friendly progress message to stderr to tell you how far it has got to. It will continue to run until the latest confirmed block at the time you start the script.

From this, we will be feeding it into another program that takes this data in and populates some other type of database structure, such as another blockchain format.

PS

In the process, I discovered what kind of load the RPC puts on my server. Here is a screencap of my netdata monitoring as this script runs and pulls it off the full node running on it.

First thing that jumped out at me was the RPC, which is processing hundreds, maybe thousands of requests per second, is only using one of the 10 cores. Second, the RPC is only feeding the data at a rate of about 1 megabyte per second (the ipv4 traffic, which is just for the loopback device, 'localhost')

So, I am not concerned at all about opening this node up to the public, the caveat is that it is not encrypted so it is not advisable for you to use it to broadcast transactions (well, if they have sensitive data, though there is not many that have that, for example the create_account function, which is currently broken anyway) if you are concerned about revealing to those outside the steem network what you are putting up to it. So use it at your own risk:

ws://projectinception.lt:8090

😎


We can't code here! This is Whale country!

Vote #1 l0k1

Go to steemit.com/~witnesses to cast your vote by typing l0k1 into the text entry at the bottom of the leaderboard.

(note, my username is spelled El Zero Kay One or Lima Zero Kilo One, all lower case)

Sort:  

Would it be much work to get a dump of all postings by one user in one file in JSON or XML? It would be great for generating TOCs and lists of photos sorted by keywords and all that with XSL-T (or Javascript, if need be).

That's a lot less intensive a job than the one this server is doing, but yes, it can be pulled out in JSON format, a converter would be required for XML. But yes, that would require a different rpc query, and then you can use your parser to generate the output you are looking for.

once it's done, I think that it is getting loaded into a DB of some type. Actually-- @l0k1-- what's the address of this on github?

I'll look up the rethinkdb python libray.

Further, do you know if your code will stream?

yes, it streams it, it puts it out to stdout on the console so it can be piped into anything

You should look into indexing it in elasticsearch! Great work!

Cool tutorial. Thanks!

Hello. I mentioned this post of yours in my latest post about my assessments:

https://steemit.com/steemit/@lightingmacsteem/the-45-days-report

I just think you might be interested on that informative post. Regards man.

Coin Marketplace

STEEM 0.18
TRX 0.15
JST 0.028
BTC 63768.57
ETH 2478.16
USDT 1.00
SBD 2.54