Regular Expressions, commonly referred to as RegEx or regex, are a powerful tool for matching and manipulating text. Python’s re
module provides support for working with regular expressions, enabling you to search, match, and manipulate strings using complex patterns. This tutorial covers the basics and advanced techniques of Python RegEx, complete with examples, explanations, and practical applications.
Regular expressions (RegEx) are sequences of characters that define a search pattern. In Python, the re
module provides a set of functions to work with regex patterns, allowing you to search, replace, and manipulate strings effectively. Regex is widely used in data validation, parsing, and text processing.
Regular expressions offer several advantages:
The re
module in Python provides various functions for working with regular expressions. Here’s how to get started:
import re
pattern = r"hello"
text = "hello world"
result = re.search(pattern, text)
if result:
print("Pattern found!")
re.search()
The re.search()
function searches for the first match of a pattern in a string. It returns a match object if found, otherwise None
.
import re
text = "Hello, world!"
result = re.search(r"world", text)
if result:
print("Found:", result.group()) # Output: Found: world
re.match()
The re.match()
function checks for a match only at the beginning of a string. It returns None
if the pattern is not found at the start.
text = "Hello, world!"
result = re.match(r"Hello", text)
if result:
print("Matched:", result.group()) # Output: Matched: Hello
re.findall()
The re.findall()
function returns a list of all matches of a pattern in a string.
text = "apple, orange, apple, banana"
matches = re.findall(r"apple", text)
print(matches) # Output: ['apple', 'apple']
re.finditer()
The re.finditer()
function returns an iterator yielding match objects for each match in the string.
text = "apple, orange, apple, banana"
matches = re.finditer(r"apple", text)
for match in matches:
print("Found at:", match.start()) # Outputs the index positions of "apple"
Regular expressions use metacharacters to create complex patterns. Here are some common ones:
.
: Matches any character except a newline.^
: Matches the start of a string.$
: Matches the end of a string.*
: Matches zero or more repetitions.+
: Matches one or more repetitions.?
: Matches zero or one repetition.[]
: Matches any character inside the brackets.\d
: Matches any digit (equivalent to [0-9]).\w
: Matches any word character (letters, digits, underscore).text = "Hello, world! 123"
result = re.findall(r"\d+", text)
print(result) # Output: ['123']
Parentheses ()
are used to create groups in regex, allowing you to capture parts of the matched text separately.
text = "John Doe, 25"
pattern = r"(\w+) (\w+), (\d+)"
match = re.search(pattern, text)
if match:
print("First Name:", match.group(1)) # Output: John
print("Last Name:", match.group(2)) # Output: Doe
print("Age:", match.group(3)) # Output: 25
(\w+)
captures the first name, (\w+)
captures the last name, and (\d+)
captures the age.Regex flags modify the behavior of regex functions. Common flags include:
re.IGNORECASE (re.I)
: Makes the pattern case-insensitive.re.MULTILINE (re.M)
: Allows ^ and $ to match the start and end of each line.re.DOTALL (re.S)
: Allows . to match newline characters as well.text = "Hello\nWorld"
result = re.search(r"hello", text, re.IGNORECASE)
print(result.group()) # Output: Hello
The re
module provides methods for replacing and splitting text based on patterns.
re.sub()
The re.sub()
function replaces matches of a pattern with a specified replacement string.
text = "Hello, world!"
result = re.sub(r"world", "Python", text)
print(result) # Output: Hello, Python!
re.split()
The re.split()
function splits a string by occurrences of a pattern.
text = "apple, orange, banana"
result = re.split(r",\s*", text)
print(result) # Output: ['apple', 'orange', 'banana']
Code:
pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
email = "example@mail.com"
if re.match(pattern, email):
print("Valid email")
else:
print("Invalid email")
Code:
text = "The event is on 2023-05-17 and another on 2024-06-18."
dates = re.findall(r"\d{4}-\d{2}-\d{2}", text)
print(dates) # Output: ['2023-05-17', '2024-06-18']
\d{4}-\d{2}-\d{2}
matches dates in YYYY-MM-DD
format.Code:
text = "Loving the weather! #sunny #happy #spring"
hashtags = re.findall(r"#\w+", text)
print(hashtags) # Output: ['#sunny', '#happy', '#spring']
#\w+
matches words that start with #
, commonly used for hashtags.Some characters, like .
and *
, have special meanings. Use \
to escape them if needed.
text = "Price is $5.99"
match = re.search(r"\$5\.99", text)
if match:
print("Price found")
Remember that ^
and $
match the start and end of the string, respectively.
text = "Hello\nworld"
match = re.search(r"^world", text, re.MULTILINE)
if match:
print("Found world at start of a line")
Regex is greedy by default; it matches the longest possible string. Use ?
for non-greedy matches.
text = "<tag>content</tag>"
match = re.search(r"<.*?>", text) # Non-greedy match
print(match.group()) # Output: <tag>
search
, match
, findall
, and sub
are essential for regex operations..
, ^
, $
, *
, and []
allow for complex patterns.Regular expressions are a powerful tool for text processing, allowing you to define complex search patterns with a concise syntax. Python’s re
module provides various functions for matching, searching, replacing, and splitting text based on regular expressions. Whether you’re validating data formats, parsing text, or extracting specific information, mastering regular expressions will greatly enhance your text manipulation skills in Python.
With Python’s re
module, you can:
Ready to start using regular expressions in Python? Try building and testing different patterns to match and manipulate text in your projects. Happy coding!