Python Menu

RegEx Module(re module) in Python


Python has a built-in package called re Module, which can be used to work with Regular Expressions.


To use it, we need to import the module.

import re

The re module offers a set of methods that allows us to search a string for a match, those are

  • match Returns a Match object if there is a match else returns None
  • findall Returns a list containing all matches
  • search Returns a Match object if there is a match anywhere in the string
  • split Returns a list where the string has been split at each match
  • sub Replaces one or many matches with a string

match()

This method finds match if it occurs at start of the string. For example, calling match() on the string ‘python programming’ and looking for a pattern ‘python’ will match. However, if we look for only programming, the pattern will not match.

The match() is the first re module method and RE object (regex object) method. The match() function attempts to match the pattern to the string, starting at the beginning. If the match is successful, a match object is returned, but on failure, None is returned.

Syntax:
re.match(pattern, string)

Example:
import re
# matching python in the given sentence
result = re.match('python', 'python programming and python')
print (result)

result = re.match('programming', 'python programming and python')
print ('\n Result :', result)
Output:
<re.Match object; span=(0, 6), match='python'>
Matching string : python
Result : None

We have Member functions to this object to get specific data like span, match, start and end

Example:
import re
result = re.match('python', 'python programming and python')
print('Starting position of the match:',result.start())
print('Ending position of the match :',result.end())
Output:
Starting position of the match : 0
Ending position of the match : 6

The group() method of a match object can be used to show the successful match. Here is an example of how to use match() [and group ()]:

Example:
import re
# matching python in the given sentence
m1 = re.match('python', 'python programming and python')
m2 = re.match('programming', 'python programming and python')
if m1:
	print (m1.group())
else:
	print("No Match")

if m2:
	print (m2.group())
else:
	print("No Match")
Output:
python
No Match

search()

It is similar to match() but it doesn’t restrict us to find matches at the beginning of the string only. The search() function searches the string for a match, and returns a Match object if there is a match.

If there is more than one match, only the first occurrence of the match will be return. If no matches are found, the value None is returned.

Unlike previous method, here searching for pattern ‘programming’ will return a match.

Syntax:
re.search(pattern, string)

Example:
import re
result = re.search('python', 'python programming and python')
print (result)
result = re.search('programming', 'python programming and python')
print (result)
Output:
<re.Match object; span=(0, 6), match='python'>
<re.Match object; span=(7, 18), match='programming'>
Example:
import re
m1 = re.search('python', 'python programming and python')
m2 = re.search('programming', 'python programming and python')
if m1:
	print (m1.group())
else:
	print("No Match")

if m2:
	print (m2.group())
else:
	print("No Match")
Output:
python
programming

The main diffrence between match and search functions

Example:
import re
m1 = re.match('programming', 'python programming and python')
m2 = re.search('programming', 'python programming and python')
if m1:
	print (m1.group())
else:
	print("No Match")
if m2:
	print (m2.group())
else:
	print("No Match")
Output:
No Match
Programming

Here you can see that, search() method is able to find a pattern from any position of the string but it only returns the first occurrence of the search pattern.

findall()

It helps to get a list of all matching patterns. It has no constraints of searching from start or end. If we will use method findall to search a given string it will return all occurrence of that string. While searching a string, I would recommend you to use re.findall() always, it can work like re.search() and re.match() both.

The re.findall() method returns a list of all matches of a pattern within the string. It returns the patterns in the order they are found. If there are no matches, then an empty list is returned.

Syntax:
re.findall(pattern, string)

Example 1:
import re
ms = re.findall('python', 'python programming and python scripting')
if ms:
	print(ms)
else:
	print("No Match")
Output:
['python', 'python']
Example 2:
import re
str = 'python 23 program 363 script 37'
ptrn = '\d+'
result = re.findall(ptrn, str) 
print(result)
Output:
['23', '363', '37']
Example 3:
import re
Nameage = '''
Kiran is 22 and Tarun is 33
Ganesh is 44 and Jony is 21'''
ages = re.findall('\d+', Nameage)
names = re.findall('[A-Z][a-z]*',Nameage)
x = 0
for eachname in names:
    print(eachname,":",ages[x])
    x+=1
Output:
Kiran : 22
Tarun : 33
Ganesh : 44
Jony : 21
Example 4:
import re
#Matching words with patterns
Str = "Sat, hat, mat, pat"
allStr = re.findall("[shmp]at", Str)
for i in allStr:
    print(i)
Output:
hat
mat
pat
Example 5:
import re
#Matching series of range of characters
Str = "sat, hat, mat, pat"
someStr = re.findall("[h-m]at", Str)
for i in someStr:
    print(i)
Output:
hat
mat
Example 6:
import re
 #Matching series of range of characters using ^
Str = "sat, hat, mat, pat"
someStr = re.findall("[^h-m]at", Str)
for i in someStr:
    print(i)
Output:
sat
pat

split()

The split() function returns a list where the string has been split at each match. This method helps to split string by the occurrences of given pattern.
The re.split method splits the string where there is a match and returns a list of strings where the splits have occurred.

Syntax:
re.split(pattern, string, [maxsplit=0])

Example 1:
import re
# \s matches white spaces
resultls = re.split('\s', 'python programming and python scripting')
print(resultls)
#split with maxsplit
resultls = re.split('\s', 'python programming and python scripting',2)
print(resultls)
Output:
['python', 'programming', 'and', 'python', 'scripting']
['python', 'programming', 'and python scripting']
Example 2:
import re
string = 'Madhu:521,Naveen:532,Ganesh:509,Ramesh:569'
pattern = ':\d{3}'
result = re.split(pattern, string) 
print(result)
Output:
['Madhu', ',Naveen', ',Ganesh', ',Ramesh', '']

sub()

The sub() function replaces the matches with the specified characters.i.e It helps to search a pattern and replace with a new sub string. If the pattern is not found, string is returned unchanged.
The method returns a string where matched occurrences are replaced with the content of replace variable.

Syntax:
re.sub(pattern, replace, string)

Example 1:
import re
str='python programming and python scripting'
resultls = re.sub('python','java',str)
print(resultls)

str='python programming and python scripting'
resultls = re.sub('\s','-',str)
print(resultls)
Output:
java programming and java scripting
python-programming-and-python-scripting
Example 2:
import re
string = ''' abc \t 123
de 45 \n  f 678'''
# matches all whitespace characters
pattern = '\s+'
# empty string
replace = ''
new_string = re.sub(pattern, replace, string) 
print(new_string)
Output:
abc123de45f678