I.T. Spices The LINUX Way

lightingmacsteem (55)in #blockchain • 7 years ago

Python In The Shell: The STEEMIT Ecosystem – Post #113

SCRAPING ALL BLOGS USING PYTHON – THE BLOG URL PREPS FINAL STEPS

Please refer to Post #110 for the complete python script and the intro of this series, link below:
https://steemit.com/blockchain/@lightingmacsteem/2rydxz-i-t-spices-the-linux-way

In this post we will be discussing the final preparations so that we can acquire the complete URL of a blog post. It is in this post that we can finally acquire the full URL of the blog.

Lines 42 to 64 below are the python lines of codes for the full processing of the URL:

42      # SHOULD BE MAIN POST AND NOT RESTEEMED POST FROM OTHER ACCOUNTS
43      if post.is_main_post() and post["author"] == account:
44          permlink = post["permlink"]
45          a = (post["tags"])
46          tags = list(a)
47          i = 1
48          for i in range(5):
49              for bb in tags:
50                  tag = bb
51                  if tag != "":
52                      i = i + 10
53                      break
54                  else:
55                      i = i + 1
56
57          #######################PRINT BORDER START OF EVERY RESULT
58          print('\n#############################################################################################')
59          flogs.write('\n#############################################################################################')
60
61          ###PRINT THE WEB URL COMPLETED
62          fullurl = (baseurl + tag + '/@' + account + '/' + permlink)
63          print('\nPOST URL:\n' + '   ' + fullurl)
64          flogs.write('\nPOST URL:' + '\n   ' + fullurl)

Line 43 is making sure that the post as finally saved is really a main post from the author; it is worth noting that a “resteem” post is also present on the author’s blogs home screen, even if he/she is not the one who wrote that post. Adding the author to this line of code made it sure that the resteemed posts are not included in the results.

Lines 44 to 64 can be clearly understood if I may state firstly that the full URL of a blog contains the following:

The full steemit web URL (https://steemit.com)
The tag (any one of the tags as given by the author when the blog is made)
The author
The unique permlink of the author’s blog

If we can get these four data, no blog in the STEEMIT website is un-scrapable. So just for the purpose of educating you as the reader, the full URL of a blog is formatted like this as per the above numbering (exclude the double quotes):

“1/2/3/4” …………………….. Got it so far?

Line 44 will surely get the blog’s unique permlink.

Lines 45 to 55 is making sure that one of the tags as given will be acquired; we do not need all the tags (maximum of five), we only need one.

So at this point of the python program we already got the steemit URL (Post #112), the tag, the author (Post #112) and the blog’s unique permlink.

Lines 62 to 64 is just formatting and printing the full URL as per my description above of the “1/2/3/4”, line 63 is displaying it on the monitor screen and line 64 is a write into the log file of the same full URL.

A.I. AT ITS VERY BEGINNINGS

I can assure everyone reading this post that at this very point, nothing is impossible anymore as far as getting data of an account’s blog post using python scraping. And yet, I want everyone to note that a blog post is just an example, a form of data (any data) that can be checked and manipulated.

Imagine the possibilities here as we all got these data automatically without even moving the mouse and keyboard. This is how AI operates, as far as any data is concerned.

That is why Facebook can spy on just anybody. No picture erased, no words deleted, no links gone, no videos unseen. All will be kept once you have it posted, even if you will decide to erase your account one day.

Just overstating, my bad.