Matching and Searching Strings with regex in Python

in #utopian-io8 years ago (edited)

What Will I Learn?

This tutorial covers following topic:

  • Regex concept
  • Regex patterns and it's matches
  • Searching vs Matching
  • Replacing patterns

Requirements

  • A PC/laptop with any Operating system such as Linux, Mac OSX, Windows OS
  • Preinstalled Python
  • Preinstalled Code Editors such as Atom, Sublime text or Pycharm IDE

Note: This tutorial is performed in Pycharm IDE in laptop with Ubuntu 17.1, 64 bit OS

Difficulty

Intermediate. I recommend learning basics of python programming before starting this tutorial. The links for previous tutorials are at the bottom of this tutorial.

Tutorial Contents

What is regex?

Regular Expression or regex is a sequence of character which helps to search, find strings using syntax which is in the form of a pattern. It defines the search pattern.
In python, we write regular expressions using raw string literals instead of regular python strings. Raw strings start with 'r' prefix.

Why to use raw string literals?

Because if we write raw strings in python then all escapes code, backslashes and special metacharacters in the string are not interpreted by the python interpreter. Regular expression contains lots of backslashes in it thus it would be difficult to interpret it if we do not use raw string form and escape it.

For example, "\n" which gives new line in output and r"\n" is not same. In *r"\n" ' \ ' is escaped and then interpreted by python interpreter.

print('\t Programming Hub')

Output:

 Programming Hub

Using raw string literals.

print(r'\t Programming Hub')

Output:

\t Programming Hub

In above code, we can clearly see that '\t' is not interpreted when used raw string form.

Regualar Expression Patterns and it's matches

PatternMatches
\dAny Digit
\DAny Non-digit character
.Any Character
\ .Period
[abc]Only a, b, or c
[^abc]Not a, b, nor c
[a-z]Characters a to z
[0-9]Numbers 0 to 9
\wAny Alphanumeric character
\WAny Non-alphanumeric character
\bword boundary
\BNot a word boundary
{m}m Repetitions
*Zero or more repetitions
+One or more repetitions
?Optional character
\sAny Whitespace, tab or new line
\SAny Non-whitespace character
^Start of a string
$End of a string
(…)Capture Group
(a(bc))Capture Sub-group
(.*)Capture all

re module

re is a python module which provides matching operations on regular expressions.

Matching vs Searching

Matching a string

re module provides various methods to match a string with the specified pattern. We can use re.match()to regex pattern to match string, re.search() to search for the first occurrence of regex pattern within the given string,

To match string, we start by importing re module to our program.

import re

Now we will declare a state in which we will perform match operation.

statement = "Learn python programming with us"

Now we will define a variable which holds returned matched object. mobj
isn that variable here. We called re.match method to match pattern with string. It takes two compulsory arguments regex pattern and string or variable that holds string.

mobj = re.match( r'Learn', statement)

Now using if else statement we print the matched word. group() returns the matched word from the statement.

if mobj:
   print("Matched word : ", mobj.group())
else:
   print("No match found!")

If we compile above codes than output will be:

Matched word :  Learn

Searching String
Searching is also like matching string. We use re.search() to search pattern in string. We will use previously defined statement to search.

sobj = re.search( r'python', statement)
if sobj:
   print("Searched word : ", sobj.group())
else:
   print("Nothing found!")

Output:

Searched word :  python

Now we will search and match the same word and see output:

import re
statement = "Learn python programming with us"
mobj = re.match( r'python', statement)
if mobj:
   print("Matched word : ", mobj.group())
else:
   print("No match found!")
sobj = re.search( r'python', statement)
if sobj:
   print("Searched word : ", sobj.group())
else:
   print("Nothing found!")

Output:

No match found!
Searched word :  python

In above code we can clearly see that, we searched and matched same word python. But re.match couldn't found it and re.search found it. This is the main differnce between searching and matching. Matching looks for match only at the beginning of the string whereas Searching search whole string for a match and returns it.

Replacing patterns

Now we will replace regex pattern that are appeared in string. We use re.sub method for doing this. This method takes 3 arguemnts compulsorily pattern, repl & string. Here repl is the replacement.
We will start by importing re module and defining a string .

import re
address = "Wall street 19, New York"

Now we will search string to find digits and print the address removing it.

add = re.sub(r'\d', "", address)
print("Address without digit: ", add)

new variable will hold the value of address after removing digits from address.
Output:

Address without digit:  Wall street , New York

all above codes including previous tutorials codes are available in my Github repo. Click here to download

For more details please visit Python Docs.

Curriculum

Python tutorials for beginners : Part - I

Python tutorials for beginners : Part - II

Python tutorials for beginners : Part - III

Object-oriented Python

Reading and writing to files in python



Posted on Utopian.io - Rewarding Open Source Contributors

Sort:  

Thank you for the contribution. It has been approved.

You can contact us on Discord.
[utopian-moderator]

Hey @fuzeh, I just gave you a tip for your hard work on moderation. Upvote this comment to support the utopian moderators and increase your future rewards!

Hey @programminghub I am @utopian-io. I have just upvoted you!

Achievements

  • You have less than 500 followers. Just gave you a gift to help you succeed!
  • Seems like you contribute quite often. AMAZING!

Community-Driven Witness!

I am the first and only Steem Community-Driven Witness. Participate on Discord. Lets GROW TOGETHER!

mooncryption-utopian-witness-gif

Up-vote this comment to grow my power and help Open Source contributions like this one. Want to chat? Join me on Discord https://discord.gg/Pc8HG9x

Coin Marketplace

STEEM 0.04
TRX 0.31
JST 0.074
BTC 63618.85
ETH 1675.97
USDT 1.00
SBD 0.41