I.T. Spices The LINUX Way

Python In The Shell: The STEEMIT Ecosystem – Post #88

THE PYTHON SCRIPT IN ACTION – PART 1

This post merits more number of words I can see by the lines of codes below, combining both the speed and agility of the python script and the simplicity of the linux cli (bash).

I will just expound by line groups:

We need to start our script with the python interpreter line 1 and then import the needed modules depending on the routines and tasks we want to accomplish. Python has the ability to fetch only the very modules it needs from its huge collection of versatile modules.

1 #!/usr/bin/python3.6
2
3 ###LOAD MODULES
4 import sys
5 import glob, os, fnmatch
6 import shutil
7 import time
8 import json
9 import pymysql
10 import datetime
11 import re
12



We need a time to use as variable to have uniquely named logs:

13 ###GET TIME NOW
14 ddate = (datetime.datetime.now().strftime("%m%d%Y-%H%M%S"))
15



This is to show something what this script does; sort of a reminder:

16 ###INFOS
17 print (' ')
18 print ('This python script will RESUME the insert of JSON queries into MYSQL')
19 print (' ')
20 time.sleep(3)
21



This is to define the files and folders to process from/to; anything inside single-quotes is exactly what the phrase says while the ones not enclosed with quotes means it is a variable and will be replaced by python with its real value (text or number):

22 ###FOLDERS
23 tempdir = '/DEVSHM'
24 ###CREATE THE FOLDERS
25 #TEMP
26 if not os.path.exists(tempdir):
27  os.mkdir(tempdir)
28 insertfile = ('/root/STEEM/steem.blockchain.json')
29 destfile = ('/DEVSHM/steem.blockchain-incomplete_resume.json')
30 logfile = ('/DEVSHM/steem.blockchain-incomplete_resume_' + ddate + '.logs')
31



This is to log every run into a file; very much useful if we wanna see if it really does what it needs to do:

32 ###LOG THIS RUN
33 print('---------', file=open(logfile, 'a'))
34 print('Timestamp for this insert is: ' + ddate, file=open(logfile, 'a'))
35



OK, fun part starts here. As we just resumed, we need to know what is the last record that was inserted into the database. Said record is then used to seek the steemit blockchain file, the line that has this unique block_id record will not be included as we need the next line to it. I am using the os module here as this is but a few queries only, so I do not need that much much speed.

Be aware that we can still improve the part on the linux cli commands (grep, sed, cut, etc):

36 ###WHAT IS THE LAST RECORD ON THE DB
37 ###LAST LINE ID
38 c = ('mysql -h172.17.0.4 -u root -p1234567 -Dsteemit_blockchain -e ' + '"SELECT MAX(id) FROM data01 ORDER BY id DESC LIMIT 1"')
39 maxid = (os.popen(c).read()).splitlines()
40 ###BLOCK_ID OF LAST LINE
41 cc = ('mysql -h172.17.0.4 -u root -p1234567 -Dsteemit_blockchain -e ' + '"SELECT block_id FROM data01 WHERE id = ' + maxid[1] + '"')
42 blockid = (os.popen(cc).read()).splitlines()
43 ###FIND THE LINE NUMBER OF THE BLOCK_ID IN THE INSERTFILE
44 findthis = ('"block_id": "' + blockid[1] + '"')
45 ccc = ('cat ' + insertfile + ' | grep -n \'' + findthis + '\' | sed -n 1p | cut --delimiter=":" -f1')
46 line = (os.popen(ccc).read()).strip()
47 ###COUNT THE NUMBER OF LINES IN THE INSERTFILE
48 ddd = ('cat ' + insertfile + ' | grep [a-z] | grep -c "{"')
49 countline = (os.popen(ddd).read()).strip()
50 print('Line number ' + line + ' will be processed until the end of the file ' + insertfile + '.......')
51 print('Line number ' + line + ' will be processed until the end of the file ' + insertfile + '.......', file=open(logfile, 'a'))
52 print('The blockchain file ' + insertfile + ' have ' + countline + ' number of lines.......')
53 print('The blockchain file ' + insertfile + ' have ' + countline + ' number of lines.......' + insertfile + '.......', file=open(logfile, 'a'))
54



We extract from the source blockchain file the lines starting from the line “after” the last block_id that was last inserted at the database. Said lines, from such line up to the end will be copied to a new file:

55 ###EXTRACT THE LINE STARTING FROM MAXID UP TO LAST AND SAVE TO DESTFILE
56 ###DELETE DESTFILE FIRST IF EXIST
57 try:
58  os.remove(destfile)
59 except OSError:
60  pass
61 if int(line) < int(countline):
62  new = open(destfile, 'a')
63  with open(insertfile, 'r') as f:
64      for i in range(int(line)):
65          next(f)
66      for addline in f:
67          new.write(addline)
68  new.close()
69  f.close()
70 else:
71  print('LINE is equal of greater than COUNTLINE.......')
72



We then check the newly created file if there are new lines. Of course, if there are, then that only means that new blockchain records have been downloaded that needs to be added to the database.

73 ###READ THE JSON FILE USING BASH CAT
74 #list = open(destfile, 'r', encoding='utf-8')
75
76 dd = ('sed -n "$=" ' + destfile)
77 checkfile = (os.popen(dd).read()).strip()
78 if (str(checkfile) != str(0)) and os.path.isfile(destfile):
79  list = open(destfile, 'r')
80  print('There is/are new blocks, I will update the database with about ' + checkfile + ' new records; please wait.......')
81  print('There is/are new blocks, I will update the database with about ' + checkfile + ' new records; please wait.......', file=open(logfile, 'a'))
82  time.sleep(9)



For every new record (if there is), we need to extract the data in accordance to the column name, block_id, timestamp, witness, etc. Remember the double-quoted titles on the left side of the json text at the previous screenshot? Here they will be parsed, but we only need its contents be aware of this, not including the title. Further, we changed every occurrence of [] (a blank entry) with the letters NA for an easier database processing later.

83 ###LOOP LINE PER LINE HERE USING WHILE
84  for line in list:
85      ###PYTHON
86      jason = json.loads(line)
87      ###JSON
88      jpython = json.dumps(jason)
89      if (jason["block_id"]) == []:
90          block_id = "NA"
91      else:   block_id = (jason["block_id"])
92
93      if (jason["extensions"]) == []:
94          extensions = "NA"
95      else:   extensions = (jason["extensions"])
96
97      if (jason["previous"]) == []:
98                  previous = "NA"
99      else:   previous = (jason["previous"])
100
101     if (jason["signing_key"]) == []:
102                 signing_key = "NA"
103     else:   signing_key = (jason["signing_key"])
104
105     if (jason["timestamp"]) == []:
106                 timestamp = "NA"
107     else:   timestamp = (jason["timestamp"])
108
109     if (jason["transaction_ids"]) == []:
110                 transaction_ids = "NA"
111     else:   transaction_ids = (jason["transaction_ids"])
112
113     if (jason["transaction_merkle_root"]) == []:
114                 transaction_merkle_root = "NA"
115     else:   transaction_merkle_root = (jason["transaction_merkle_root"])
116
117     if (jason["witness"]) == []:
118                 witness = "NA"
119     else:   witness = (jason["witness"])
120
121     if (jason["witness_signature"]) == []:
122                 witness_signature = "NA"
123     else:   witness_signature = (jason["witness_signature"])
124
125     if (jason["transactions"]) == []:
126                 transactions = "NA"
127     else:   transactions = (json.dumps(jason["transactions"]))
128



This is the part where python connects to the database one time, then does the inserts for multiple lines. It will only close the connection to the database if all new records are already processed. Be aware of this type of connection with the one-time connection as explained above in the earlier routine. The previous routine just went to the linux cli (speed not needed), but here, python took care of the speed and robustness of the mysql routines.

129     ###MARIADB CONNECTIONS
130     db = pymysql.connect(host='172.17.0.4', user='root', password='1234567', db='steemit_blockchain')
131     cursor = db.cursor()
132     #print('I am processing block_id---------> ' + block_id)
133     #print(transactions)
134     # INSERT JSON RECORDS
135     sql1 = ("INSERT INTO `data01`(block_id, extensions, previous, signing_key, timestamp, transaction_ids, transaction_merkle_root, witness, witness_signature, transactions) VALUES(%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)")
137     data1 = (block_id, extensions, previous, signing_key, timestamp, transaction_ids, transaction_merkle_root, witness, witness_signature, (str(transactions)))
138     try:
139         # Execute the SQL command
140         cursor.execute(sql1, (data1))
141         db.commit()
142     except:
143         # Rollback in case there is any error
144         db.rollback()
145 
146     ###CLOSE MARIADB CONNECTION AFTER</code>
147     finally:
148         db.close()



This is to log the time when the database inserts are finished; very useful later in lots of purposes.

149 ###GET TIME NOW
150 ddate = (datetime.datetime.now().strftime("%m%d%Y-%H%M%S"))
151 print('Timestamp for FINISH insert is: ' + ddate + '\n\n', file=open(logfile, 'a'))
152
153 else:
154 ddate = (datetime.datetime.now().strftime("%m%d%Y%H%S"))</code>
155 print('Timestamp for this insert is: ' + ddate, file=open(logfile, 'a'))</code>
156 print('No new records found.......')</code>
157 print('No new records found.......\n\n', file=open(logfile, 'a'))</code>
158 time.sleep(9)</code>
159



It is always best to put all the logs on one central server; it just happened that said server is remotely located (on another IP) so I employed SSH in doing this.

160 ###RELAY THE LOGS TO THE CENTRAL LOG SERVER
161 ee = ('cat ' + logfile + ' | ssh [email protected] "cat >> /MARIO/BACKUPS/LOGS/insert-json-into-mysql_resume.logs"')
162 os.system(ee)



All routines as discussed are encapsulated within the python interpreter, meaning, even if we went back to the native linuc cli commands, it is still python that took care of the tos-and-fros.

We will go back to specifics on the above routines don’t worry, learning is what we want here right? We will also discuss some good improvements as part of our next discussions.


“It Is Always Best To Think Of Others’ Welfare First……...”

Sort:  

Hello! Your post has been resteemed and upvoted by @ilovecoding because we love coding! Keep up good work! Consider upvoting this comment to support the @ilovecoding and increase your future rewards! ^_^ Steem On!

Reply !stop to disable the comment. Thanks!

Coin Marketplace

STEEM 0.20
TRX 0.12
JST 0.028
BTC 64277.14
ETH 3492.61
USDT 1.00
SBD 2.51