A python script that streams the raw JSON of the Steem blockchain

in #dawn-network7 years ago (edited)

I


have now almost finished a script that will allow us to pull the raw JSON data from a steemd RPC node and feed it to a RethinkDB database (our initial database target).

I have not written the part in it yet that actually feeds it to the DBMS, and it has configurable parameters that let you start at any block in the chain including the latest and stream new blocks as they come through.

If you have seen some of the pages that show this visually, this is the console version, you can watch people's votes and posts coming in in real time, it waits 3 seconds when it hits the head block and then retries (which will get a new block unless the witness misses it).

#!/usr/bin/python3
###################################
#
# blockstreem.py
#
# Reads complete block log from steemd and outputs
# to stdout so it can be piped into other programs
# and continues to stream new blocks as the appear
#
###################################
import os
import argparse
import piston
import rethinkdb as r
from pprint import pprint
from pistonapi import SteemNodeRPC
from islistening import islistening
from urllib.parse import urlparse
import time

parser = argparse.ArgumentParser ( description = "Query steemd for raw JSON blocks and optionally stream up to a TCP server" )
parser.add_argument ( '-b', '--startblock', type = int, default = 1, help = 'starting block - set to -1 to start with the last irreversible block' )
parser.add_argument ( '-s', '--steemdip', type = str, default = 'projectinception.lt', help="Steem RPC node address" )
parser.add_argument ( '-p', '--steemdport', type = int, default = 8090, help = 'Steem RPC node port' )
parser.add_argument ( '-r', '--rethinkdbip', type = str, default = 'projectinception.lt', help="RethinkDB RPC node address" )
parser.add_argument ( '-P', '--rethinkdbport', type = int, default = 28015, help = 'RethinkDB port' )
parser.add_argument ( '-o', '--stdout', action = 'store_true', help = 'send output to stdout instead of rethinkdb' )
args = parser.parse_args ( )

if not args.stdout:
  if islistening ( args.rethinkdbip, args.rethinkdbport ):
    # connect to rethinkdb server
    r.connect( args.rethinkdbip, args.rethinkdbport, db='test').repl()
    #r.table('everything').run()
  else:
    print ( "unable to connect to rethinkdb" )

# connect to steem RPC
if islistening ( args.steemdip, args.steemdport ):
  rpc = SteemNodeRPC( "ws://" + args.steemdip + ':' + str ( args.steemdport ), "", "")
else:
  print ( "unable to connect to steemd rpc endpoint" )
  quit ( )

props = rpc.get_dynamic_global_properties ( )
current_block = args.startblock
last_confirmed_block = props [ 'last_irreversible_block_num' ]
if args.startblock == -1:
  current_block = last_confirmed_block

while True:
    os.write ( 2, str.encode( "Block #" + str ( current_block ) + '\r' ) )
    #try:
    block = rpc.get_block(current_block)
    if block != None:
      if args.stdout:
        print ( block.replace("'", '%temp%').replace('"', '\"').replace('%temp%', '"') )
      else:
        # Send JSON output to rethinkdb
        out_block = json.loads ( str ( block ).replace("'", '%temp%').replace('"', '\"').replace('%temp%', '"') )
        pass
      current_block += 1
    else:
      time.sleep ( 3 )      

update

Oops, I forgot to also post the script that does the check to ensure the two servers it addresses are listening:

#!/usr/bin/python3
import socket
from urllib.parse import urlparse

def islistening ( host, port ):
  captive_dns_addr = ""
  host_addr = ""
  try:
    captive_dns_addr = socket.gethostbyname("BlahThisDomaynDontExist22.com")
  except socket.error:
    pass
  try:
    host_addr = socket.gethostbyname(host)
    if (captive_dns_addr == host_addr):
      return False
      s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
      s.settimeout(1)
      s.connect((host, port))
      s.close()
  except socket.error:
      return False
  return True

PPS:

Ah fer cryin' out loud... as I am trying to implement the convert to python datatypes, it's giving me this error:

  File "/usr/lib/python3.6/site-packages/simplejson/decoder.py", line 396, in raw_decode
    raise TypeError("Input string must be text, not bytes")

from this code:

    out_block = json.loads ( block )

Sometimes I love python 3 but right now I am hating it (this is an issue caused by the old version defaulting to ASCII and now it's UTF-8 which python has a fit if you don't put bytes'string is in here'). And another awesome thing (not) is that pprint prints out all of the json uses single quotes, which ... sigh I'm glad this may well be the last serious bit of python I am writing. I am going to learn to code in Go and no more idiot python nonsense.

PPPS

Ok, I have found the solution:

        print ( block.replace("'", '%temp%').replace('"', '\"').replace('%temp%', '"') )
       else:
        # Send JSON output to rethinkdb
        out_block = json.loads ( str ( block ).replace("'", '%temp%').replace('"', '\"').replace('%temp%', '"') )

and I have now put that into the code above for accuracy.

😎


We can't code here! This is Whale country!

Vote #1 l0k1

Go to steemit.com/~witnesses to cast your vote by copy and paste l0k1 into the text entry at the bottom of the leaderboard.

(note, my username is spelled El Zero Kay One or Lima Zero Kilo One, all lower case)

Sort:  

I wonder why this flavour of markdown doesn't support pygmenting source code, though as markdown help, steemit points to github I remember steemit pointed to github (but I can't find where this alleged link is right now), and github md surely supports source highlighting…

Because it is very nonstandard. Oh, of course I am uploading all my code to the Dawn git repo as well: https://github.com/dawn-network/misc

Busy doesn't highlight the code either.

That's right! Puttin' in work! Thank you for your labors of love. :)

Go is amazing, but I think you will bump your head in a similar manner until you get used to the type conversion in it.

We have done 2 very small projects using Go at work, I wish I could do more. I am doing a personal project using Python and must say that I am not enjoying that part at all.

Thanks for sharing!

I am quite used to type conversion, in fact I have only developed my python skills this much because for Steem development, the only other option is the better developed steem.js, in node.js and I hate javascript. Python 3 is far more strictly typed than Python 2, as anyone who has dipped their toes in it knows, such as the distinct type called 'bytes' which is functionally equivalent to 'str' except it is designed to cope with UTF-8.

Coin Marketplace

STEEM 0.19
TRX 0.13
JST 0.028
BTC 65178.97
ETH 3261.27
USDT 1.00
SBD 2.68