Learn Python Series (#4) - Round-Up #1

scipio (65)in #utopian-io • 6 years ago (edited)

Learn Python Series (#4) - Round-Up #1

What Will I Learn?

You will learn how to combine essential Python language mechanisms, and the built-in string methods, to program your own, self-defined, real-life and useful functions,
In the code examples I'll be only using what I've covered in the previous Learn Python Series episodes.

Requirements

A working modern computer running macOS, Windows or Ubuntu
An installed Python 3(.6) distribution, such as (for example) the Anaconda Distribution
The ambition to learn Python programming

Difficulty

Intermediate

Tutorial Contents

A full description of the topics of this video tutorial, plus the contents of the tutorial itself.

Curriculum (of the `Learn Python Series`):

Learn Python Series (#4) - Round-Up #1

This is the first Round-up episode within the Learn Python Series, in which I will show you how to build interesting things using just the mechanisms that were covered already in the previous Learn Python Series episodes.

Of course, as the series progress, with each tutorial episode more tools are added to our tool belt, so to keep things organized I'll try to use mostly what was covered in the last few episodes.

Getting creative with strings

Programming is a creative task. Depending on the complexity of what you want to build, you first need to have a fairly clear idea of how to achieve your goal - a working program - and while you're coding you oftentimes run into problems (or "puzzles") that need to be solved. In order to become a proficient programmer, in Python and in any programming language, it's very important that you enjoy trying to solve those "puzzles". The more "tools" you have on your "tool belt", the complexer the puzzles you're able to solve. To get better at programming, I think it's also important to keep pushing your limits: get out of your comfort zone and expand your horizons!

Up until now, in the previously released Handling Strings tutorials, we've been discussing the usage of individual string methods. But of course we can combine their individual strengths to create self-defined functions that do exactly what we want them to do! That's actually the beauty of the Python programming language: we can pick (and import!) individual "tools", just the tools we need to use per project / script, use them as "building blocks", and then create even better tools or more advanced "building blocks" for our own purposes.

Disclaimer: The following two "mini-projects" cover how to program self-defined, somewhat useful, string handling functions. I'm not stating these are the best, let alone only, ways to program them. The goal is to show, to the reader / aspiring Python programmer, that only understanding what was covered already is enough to program interesting and useful code with!

Mini project `parse_url()`

In case you want to program a web crawler, to fetch unstructured data from web pages if an API providing structured JSON data is missing, or in case you want to build an run a full-fledged search engine, you need to handle URLs. URLs come in many forms, but still have components that are characteristic to any URL. In order to properly use URLs, you need to "parse" them and "split" them into their components.

Let's see how to develop a parse_url() function that splits several URL components and returns them as a tuple. We're looking to return these components:

protocol or scheme (e.g. https://),
host (which could be an IP address or something like www.google.com),
the domain name (e.g. steemit.com),
the Top Level Domain TLD (e.g. .com),
the subdomain (e.g. staging in https://staging.utopian.io),
and the file path (e.g. index.php?page=321)

PS: The explanations are put inside the code as # comments!

def parse_url(url):
    
    # First we initiate the variables we want 
    # the function to return, set all to an empty string.
    
    scheme = host = subdomain = tld = domain = path = ''    
    
    # ---------------------------------------------------
    # -1- Identify and, if applicable, isolate the scheme
    # ---------------------------------------------------
    
    needle = '://'
    if needle in url:
        
        scheme_index = url.find(needle)
        scheme = url[:scheme_index + len(needle)]        
        
        # Slice the scheme from the url
        
        url = url[len(scheme):]
    
    # ---------------------------------------------------
    # -2- Identify and, if applicable, isolate 
    #     the file path from the host
    # ---------------------------------------------------
    
    needle = '/'
    if needle in url:
        
        # Split the host from the file path.
        
        host, path = url.split(sep=needle, maxsplit=1)
        
    else:
        
        # The remaining url is the host
        
        host = url
        
    # ---------------------------------------------------
    # -3- Check if the host is an IP address or if it
    #     contains a domain
    # ---------------------------------------------------

    # Remove the dots from the host
    
    needle = '.'
    no_dots = host.replace(needle, '')
    if no_dots.isdigit() == False:
        
        # The host contains a domain, so continue
    
        # ---------------------------------------------------
        # -4- Identify and isolate the tld
        # ---------------------------------------------------    

        num_dots = host.count(needle)

        # --- NB1: ---
        # When num_dots == 0 , the string wasn't a url! ;-)
        # But let's just assume for now the string is a valid url.    

        if num_dots == 1:

            # The host does not contain a subdomain

            domain = host
            tld = host[host.find(needle)+1:]

        elif num_dots > 1:

            # The host might contain a subdomain

            # --- NB2: ---
            # In order to distinguish between a host containing
            # one or more subdomains, and a host containing a 3rd
            # or higher level tld, or both, we need a list 
            # that contains all tlds.
            #
            # That list seems to be here ...
            #
            # https://publicsuffix.org/list/public_suffix_list.dat
            #
            # ... but we haven't covered yet how to fetch 
            # data from the web.
            #
            # So for now, let's just create a list containing
            # some 3rd level tlds, and just assume it is complete.

            all_3rdlevel_tlds = ['co.uk', 'gov.au', 'com.ar']

            for each_tld in all_3rdlevel_tlds:
                if each_tld in host:

                    # Apparently the tld in the url is a 3rd level tld

                    tld = each_tld                
                    break

            # ---------------------------------------------------
            # PS: Notice that this `else` belongs to the `for`
            #     and not the `if` ! It only runs when the `for`
            #     exhausted but did not break.
            # ---------------------------------------------------

            else:            
                tld = host[host.rfind(needle)+1:]            

            # ---------------------------------------------------
            # -5- Identify and, if applicable, isolate 
            #     the subdomain from the domain
            # ---------------------------------------------------  

            host_without_tld = host[:host.find(tld)-1]        
            num_dots = host_without_tld.count(needle)

            if num_dots == 0:

                # The host doesn't contain a subdomain

                domain = host_without_tld + needle + tld

            else:

                # The host contains a subdomain

                subdomain_index = host_without_tld.rfind('.')
                subdomain = host_without_tld[:subdomain_index]
                domain = host[subdomain_index+1:]        

    return scheme, host, subdomain, domain, tld, path

# Let's test the function on several test urls!

test_urls = [
    'https://www.steemit.com/@scipio/recent-replies',
    'https://steemit.com/@scipio/recent-replies',
    'http://www.londonlibrary.co.uk/index.html',
    'http://londonlibrary.co.uk/index.html',
    'https://subdomains.on.google.com/',
    'https://81.123.45.2/index.php'
]

# And finally call the parse_url() function,
# and print its returned output!

for url in test_urls:
    print(parse_url(url))

# YES! It works like a charm! ;-)
# ---------
# Output:
# ---------
# ('https://', 'www.steemit.com', 'www', 'steemit.com', 'com', '@scipio/recent-replies')
# ('https://', 'steemit.com', '', 'steemit.com', 'com', '@scipio/recent-replies')
# ('http://', 'www.londonlibrary.co.uk', 'www', 'londonlibrary.co.uk', 'co.uk', 'index.html')
# ('http://', 'londonlibrary.co.uk', '', 'londonlibrary.co.uk', 'co.uk', 'index.html')
# ('https://', 'subdomains.on.google.com', 'subdomains.on', 'google.com', 'com', '')
# ('https://', '81.123.45.2', '', '', '', 'index.php')

('https://', 'www.steemit.com', 'www', 'steemit.com', 'com', '@scipio/recent-replies')
('https://', 'steemit.com', '', 'steemit.com', 'com', '@scipio/recent-replies')
('http://', 'www.londonlibrary.co.uk', 'www', 'londonlibrary.co.uk', 'co.uk', 'index.html')
('http://', 'londonlibrary.co.uk', '', 'londonlibrary.co.uk', 'co.uk', 'index.html')
('https://', 'subdomains.on.google.com', 'subdomains.on', 'google.com', 'com', '')
('https://', '81.123.45.2', '', '', '', 'index.php')

Mini project `encode_gibberish()` and `decode_gibberish()`

Remember my hidden message that was contained in the "Gibberish string", covered in Handling Strings Part 1? For a brief reminder, we used a -3 negative stride on a reversed string that contained the hidden message hidden within a bunch of nonsense.

This was the code:

gibberish_msg = """!3*oJ6iFupOGiF6cNFSHU 6dmVhoKUrTvfHi 
                    KteBrgHvaIgsX$snTeIgmV0 HvnYGembdJRd*&i$6h&6 &5a*h BGsF@iGv NhsIgiYdh67T"""
print(gibberish_msg[::-3])
# This is a hidden message        from Scipio!

This is a hidden message        from Scipio!

Now as the second "mini-project" for this Round-Up, let's learn how to program a function encode_gibberish() to encode a gibberish string from a message, and another one decode_gibberish() to reveal / decode the hidden message contained inside the gibberish!

PS: The explanations are put inside the code as # comments!

def encode_gibberish(message, stride=1):
    
    # Let's use a mixed-up `chars` list containing lower-case letters,
    # upper-case letters, integers 0-9, and some other characters,
    # all found on a regular keyboard.
        
    chars = ['x', '-', 'G', 'H', 'l', 'a', '{', 'r', 2, ']', 
         ';', 'F', 'E', 'A', 'V', ')', '$', '?', '/', 
         'i', 'M', 'p', 9, 'C', 'w', 'k', '}', ':', 
         '_', '%', 'D', 'I', 'b', 'z', 'd', 6, 'N', 
         'L', 'c', '.', 1, 'X', 'h', 4, '!', 'S', '~', 
         'u', '+', 'f', 'R', 8, 3, '&', '<', 'y', 'Z', 
         'P', 'n', '^', 'J', 'q', 5, 'o', 'W', '*', 'Q', 
         7, 'B', 'g', 'O', 'K', 'm', ',', 's', '>', 
         'T', '(', '#', 't', 'j', 'e', 
         'Y', '@', '[', 'v', '=', 'U'
    ]
    
    # Initialize an iterator for the `chars` list
    
    chars_index = 0

    # Convert the message string to a list
    
    message = list(message)
    
    # Quick fix for negative strides:
    # if stride is negative, use the 
    # absolute (positive) value
    
    abs_stride = stride * -1 if stride < 0 else stride
    
    # For all characters from the `message` list,
    # add characters from the `chars` list    
    
    for index in range(len(message)):
        
        # Iterate over the `chars` list, and per
        # `message` character concatenate as many
        # characters as the `stride` argument
        
        salt = ''        
        for i in range(abs_stride):
            salt += str(chars[chars_index])
            if chars_index == len(chars)-1:
                chars_index = 0
            else:
                chars_index += 1
        message[index] = message[index] + salt
    
    # Convert back to string
    message = ''.join(message)
 
    # In case of a negative stride, 
    # reverse the message    
    if stride < 0:
        message = message[::-1] 
    
    return message

def decode_gibberish(encoded_msg, stride=1):
    
    # Simply decode the encoded message using
    # the `stride` argument
    
    stride = stride + 1 if stride > 0 else stride -1
    return encoded_msg[::stride]

# Let's see if this works!
stride = -5
msg1 = "This is a very secret message that must be encoded at all cost. Because it's secret!"

# Encode, and decode
encoded_msg = encode_gibberish(msg1, stride)
decoded_msg = decode_gibberish(encoded_msg, stride)

# Print the encoded and decoded message strings
print(encoded_msg)
print(decoded_msg)

7Q*Wo!5qJ^ntPZy<&e38Rf+ru~S!4chX1.ceLN6dzsbID%_ :}kwCs9pMi/'?$)VAtEF;]2ir{alH G-xU=ev[@Yesjt#(Tu>s,mKaOgB7Qc*Wo5qeJ^nPZBy<&38 Rf+u~.S!4hXt1.cLNs6dzbIoD%_:}ckwC9p Mi/?$l)VAEFl;]2r{aalHG- xU=v[t@Yejta#(T>s ,mKOgdB7Q*Weo5qJ^dnPZy<o&38Rfc+u~S!n4hX1.ecLN6d zbID%e_:}kwbC9pMi /?$)VtAEF;]s2r{aluHG-xUm=v[@Y ejt#(tT>s,maKOgB7hQ*Wo5tqJ^nP Zy<&3e8Rf+ug~S!4haX1.cLsN6dzbsID%_:e}kwC9mpMi/? $)VAEtF;]2re{alHGr-xU=vc[@Yejet#(T>ss,mKO gB7Q*yWo5qJr^nPZye<&38Rvf+u~S !4hX1a.cLN6 dzbIDs%_:}kiwC9pM i/?$)sVAEF;i]2r{ahlHG-xT
This is a very secret message that must be encoded at all cost. Because it's secret!

What did we learn, hopefully?

That, although we have yet still only covered just a few Python languages mechanisms and haven't even used an import statement, which we will cover in the next Learn Python Series episode, we already have "the power" to program useful functions! We only needed 4 tutorial episodes for this, so let's find out just how much more we can learn in the next episodes! See you there!

Thank you for your time!

Posted on Utopian.io - Rewarding Open Source Contributors

#steemdev #steemstem #open-source #python

6 years ago in #utopian-io by scipio (65)

$59.74

Sort:

Trending

[-]

amosbastian (72) 6 years ago

Thank you for the contribution. It has been approved.

I also never realised forloops can also have an else clause...

You can contact us on Discord.
[utopian-moderator]

$1.49

2 votes

[-]

utopian.tip (46) 6 years ago

Hey @amosbastian, I just gave you a tip for your hard work on moderation. Upvote this comment to support the utopian moderators and increase your future rewards!

$0.03

1 vote

[-]

scipio (65) 6 years ago

I'll be covering some "uncommon Python constructs" in a special episode later on! Stay tuned for more! And thanks for the approval!

$0.00

[-]

steemstem-bot (57) 6 years ago

		Guidelines	Project Update

Being A SteemStem Member

$0.59

1 vote

[-]

someguy123 (69) 6 years ago

Wait. There's a for, else feature?

I've coded with python for many years and not even I knew this.

Big +1 to this tutorial (though it'd be nice if it were broken up and explained rather than a big code block with comments)

$0.45

1 vote

[-]

scipio (65) 6 years ago (edited)

Cool huh! for else rocks! ;-)

I do agree with your explanations / wall of code / code formatting remark. But it's kind of hard to do it otherwise when there's a big (long) function body involved. The Condenser (Steemit.com) interface could do with a proper sintax-highlighted code formatter on the front-end, plenty of Open Source repos out there to implement that in no-time.

But I will try to implement your suggestion on the following episodes where longer code bodies are involved in one scope block. I could for example first add bits of code, then explain them, rinse & repeat, and at the end drop in all the code. However, I'm writing these on a Jupyter Notebook implementation (as iPynb files, when ready I'm exporting them to .md), and since they contain interactive "cells" that actually run / interpret the code it's kind of hard to detach groups of statements that belong to the same scope block into different cells (because of the indentation requirements of Python). Hence I'll get errors running them, and because I'm going to publish them as iPynb files on GitHub as well, I want each cell to work... Decisions, decisions! ;-)

$0.00

[-]

utopian-io (71) 6 years ago

Hey @scipio I am @utopian-io. I have just upvoted you!

Achievements

Seems like you contribute quite often. AMAZING!

Community-Driven Witness!

I am the first and only Steem Community-Driven Witness. Participate on Discord. Lets GROW TOGETHER!

Vote for my Witness With SteemConnect
Proxy vote to Utopian Witness with SteemConnect
Or vote/proxy on Steemit Witnesses

Up-vote this comment to grow my power and help Open Source contributions like this one. Want to chat? Join me on Discord https://discord.gg/Pc8HG9x

$0.19

1 vote

[-]

scipio (65) 6 years ago

Thx @utopian-io ! Beep! Beep!

$0.00

[-]

zoef (64) 6 years ago

Again a wonderfull tutorial. I needed to keep myself focused to understand it all but I managed. Tnx for sharing!

$0.13

2 votes

[-]

mdf-365 (57) 6 years ago

Thanks for putting this together. Attempting to get some exposure to Python and do some experiments here on Steemit - appreciate the effort

$0.13

1 vote

[-]

scipio (65) 6 years ago

If you want to help with more Python exposure on Steemit , then help getting some attention to my Learn Python Series! :-)
I will be covering about anything there is to cover on Python, and (parts of) this series will also be published as an interactive book! All for free of course!

$0.00

[-]

steemitstats (47) 6 years ago

@scipio, I like your contribution to open source project, so I upvote to support you.

$0.00

[-]

scipio (65) 6 years ago

Wow! Which part did you like best?

$0.00

STEEM 0.16

TRX 0.13

JST 0.027

BTC 60696.91

ETH 2593.10

USDT 1.00

SBD 2.56

Learn Python Series (#4) - Round-Up #1

Learn Python Series (#4) - Round-Up #1

What Will I Learn?

Requirements

Difficulty

Tutorial Contents

Curriculum (of the Learn Python Series):

Learn Python Series (#4) - Round-Up #1

Getting creative with strings

Mini project parse_url()

Mini project encode_gibberish() and decode_gibberish()

What did we learn, hopefully?

Thank you for your time!

Hey @scipio I am @utopian-io. I have just upvoted you!

Achievements

Community-Driven Witness!

Coin Marketplace

Curriculum (of the `Learn Python Series`):

Mini project `parse_url()`

Mini project `encode_gibberish()` and `decode_gibberish()`