A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. Regular expressions are widely used in UNIX world.
The module re provides full support for Perl-like regular expressions in Python. The re module raises the exception re.error if an error occurs while compiling or using a regular expression.
Match syntax
re.match(pattern, string, flags=0)
pattern - This is the regular expression to be matched.
string - This is the string, which would be searched to match the pattern at the beginning of string.
flags - You can specify different flags using bitwise OR (|).
The re.match function returns a match object on success, None on failure. We usegroup(num) or groups() function of match object to get matched expression.
group(num=0) - This method returns entire match (or specific subgroup num)
groups() - This method returns all matching subgroups in a tuple (empty if there weren't any)
Example:
>>> import re
>>>
>>> line = "Cats are smarter than dogs"
>>> matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I)
>>> matchObj
<re.Match object; span=(0, 26), match='Cats are smarter than dogs'>
>>> if matchObj:
... print("matchObj.group() : ", matchObj.group())
... print("matchObj.group(1) : ", matchObj.group(1))
... print("matchObj.group(2) : ", matchObj.group(2))
... else:
... print("No match")
...
matchObj.group() : Cats are smarter than dogs
matchObj.group(1) : Cats
matchObj.group(2) : smarter
This function searches for first occurrence of RE pattern within string with optional flags.
re.search(pattern, string, flags=0)
The re.search function returns a match object on success, none on failure. We use group(num) or groups() function of match object to get matched expression.
Example:
>>> import re
>>>
>>> line = "Cats are smarter than dogs";
>>>
>>> searchObj = re.search( r'(.*) are (.*?) .*', line, re.M|re.I)
>>>
>>> if searchObj:
... print("searchObj.group() : ", searchObj.group())
... print("searchObj.group(1) : ", searchObj.group(1))
... print("searchObj.group(2) : ", searchObj.group(2))
... else:
... print("Nothing found")
...
searchObj.group() : Cats are smarter than dogs
searchObj.group(1) : Cats
searchObj.group(2) : smarter
Python offers two different primitive operations based on regular expressions: match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string (this is what Perl does by default).
Example:
>>> import re
>>>
>>> line = "Cats are smarter than dogs";
>>>
>>> matchObj = re.match( r'dogs', line, re.M|re.I)
>>>
>>> if matchObj:
... print("match --> matchObj.group() : ", matchObj.group())
... else:
... print("No match")
...
No match
>>> searchObj = re.search( r'dogs', line, re.M|re.I)
>>> if searchObj:
... print("search --> searchObj.group() : ", searchObj.group())
... else:
... print("Nothing found")
...
search --> searchObj.group() : dogs
One of the most important re methods that use regular expressions is sub.
re.sub(pattern, repl, string, max=0)
This method replaces all occurrences of the RE pattern in string with repl, substituting all occurrences unless max provided. This method returns modified string.
>>> import re
>>>
>>> phone = "2004-959-559 # This is Phone Number"
>>> num = re.sub(r'#.*$', "", phone)
>>> print("Phone Num : ", num)
Phone Num : 2004-959-559
>>> num = re.sub(r'\D', "", phone)
>>> print("Phone Num : ", num)
Phone Num : 2004959559
First replace deletes Python-style comments, while second remove anything other than digits.
re.I - Performs case-insensitive matching.
re.M - Makes $ match the end of a line (not just the end of the string) and makes ^ match the start of any line (not just the start of the string).
re.S - Makes a period (dot) match any character, including a newline.
re.U - Interprets letters according to the Unicode character set. This flag affects the behavior of \w, \W, \b, \B.
^ - Matches beginning of line.
$ - Matches end of line.
. - Matches any single character except newline. Using m option allows it to match newline as well.
\w - Matches word characters.
\W - Matches word characters.
\s - Matches whitespace. Equivalent to [\t\n\r\f].
\S - Matches nonwhitespace.
\d - Matches digits. Equivalent to [0-9].
{ "data": { "sessionMaterial": { "id": "session-material:2019/tieto-ostrava-jaro:regular-expressions:1", "title": "Regular expressions", "html": "\n \n \n\n <h2>Regular expressions</h2>\n<p>A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. Regular expressions are widely used in UNIX world.</p>\n<p>The module re provides full support for Perl-like regular expressions in Python. The re module raises the exception re.error if an error occurs while compiling or using a regular expression.</p>\n<h3>The match Function</h3>\n<p>Match syntax</p>\n<div class=\"highlight\"><pre><code>re.match(pattern, string, flags=0)</code></pre></div><p><strong>pattern</strong> - This is the regular expression to be matched.</p>\n<p><strong>string</strong> - This is the string, which would be searched to match the pattern at the beginning of string.</p>\n<p><strong>flags</strong> - You can specify different flags using bitwise OR (|).</p>\n<p>The re.match function returns a match object on success, None on failure. We usegroup(num) or groups() function of match object to get matched expression.</p>\n<p><strong>group(num=0)</strong> - This method returns entire match (or specific subgroup num)</p>\n<p><strong>groups()</strong> - This method returns all matching subgroups in a tuple (empty if there weren't any)</p>\n<p>Example:</p>\n<div class=\"highlight\"><pre><code>>>> import re\n>>>\n>>> line = "Cats are smarter than dogs"\n>>> matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I)\n>>> matchObj\n<re.Match object; span=(0, 26), match='Cats are smarter than dogs'>\n>>> if matchObj:\n... print("matchObj.group() : ", matchObj.group())\n... print("matchObj.group(1) : ", matchObj.group(1))\n... print("matchObj.group(2) : ", matchObj.group(2))\n... else:\n... print("No match")\n...\nmatchObj.group() : Cats are smarter than dogs\nmatchObj.group(1) : Cats\nmatchObj.group(2) : smarter</code></pre></div><h3>The search Function</h3>\n<p>This function searches for first occurrence of RE pattern within string with optional flags.</p>\n<div class=\"highlight\"><pre><code>re.search(pattern, string, flags=0)</code></pre></div><p>The re.search function returns a match object on success, none on failure. We use group(num) or groups() function of match object to get matched expression.</p>\n<p>Example:</p>\n<div class=\"highlight\"><pre><code>>>> import re\n>>>\n>>> line = "Cats are smarter than dogs";\n>>>\n>>> searchObj = re.search( r'(.*) are (.*?) .*', line, re.M|re.I)\n>>>\n>>> if searchObj:\n... print("searchObj.group() : ", searchObj.group())\n... print("searchObj.group(1) : ", searchObj.group(1))\n... print("searchObj.group(2) : ", searchObj.group(2))\n... else:\n... print("Nothing found")\n...\nsearchObj.group() : Cats are smarter than dogs\nsearchObj.group(1) : Cats\nsearchObj.group(2) : smarter</code></pre></div><h3>Matching Versus Searching</h3>\n<p>Python offers two different primitive operations based on regular expressions: match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string (this is what Perl does by default).</p>\n<p>Example:</p>\n<div class=\"highlight\"><pre><code>>>> import re\n>>>\n>>> line = "Cats are smarter than dogs";\n>>>\n>>> matchObj = re.match( r'dogs', line, re.M|re.I)\n>>>\n>>> if matchObj:\n... print("match --> matchObj.group() : ", matchObj.group())\n... else:\n... print("No match")\n...\nNo match\n>>> searchObj = re.search( r'dogs', line, re.M|re.I)\n>>> if searchObj:\n... print("search --> searchObj.group() : ", searchObj.group())\n... else:\n... print("Nothing found")\n...\nsearch --> searchObj.group() : dogs</code></pre></div><h3>Search and Replace</h3>\n<p>One of the most important re methods that use regular expressions is sub.</p>\n<div class=\"highlight\"><pre><code>re.sub(pattern, repl, string, max=0)</code></pre></div><p>This method replaces all occurrences of the RE pattern in string with repl, substituting all occurrences unless max provided. This method returns modified string.</p>\n<div class=\"highlight\"><pre><code>>>> import re\n>>>\n>>> phone = "2004-959-559 # This is Phone Number"\n>>> num = re.sub(r'#.*$', "", phone)\n>>> print("Phone Num : ", num)\nPhone Num : 2004-959-559\n>>> num = re.sub(r'\\D', "", phone)\n>>> print("Phone Num : ", num)\nPhone Num : 2004959559</code></pre></div><p>First replace deletes Python-style comments, while second remove anything other than digits.</p>\n<h3>Regular Expression Modifiers, Option Flags</h3>\n<p><strong>re.I</strong> - Performs case-insensitive matching.</p>\n<p><strong>re.M</strong> - Makes $ match the end of a line (not just the end of the string) and makes ^ match the start of any line (not just the start of the string).</p>\n<p><strong>re.S</strong> - Makes a period (dot) match any character, including a newline.</p>\n<p><strong>re.U</strong> - Interprets letters according to the Unicode character set. This flag affects the behavior of \\w, \\W, \\b, \\B.</p>\n<h3>Regular Expression Patterns</h3>\n<p><strong>^</strong> - Matches beginning of line.</p>\n<p><strong>$</strong> - Matches end of line.</p>\n<p><strong>.</strong> - Matches any single character except newline. Using m option allows it to match newline as well.</p>\n<p><strong>\\w</strong> - Matches word characters.</p>\n<p><strong>\\W</strong> - Matches word characters.</p>\n<p><strong>\\s</strong> - Matches whitespace. Equivalent to [\\t\\n\\r\\f].</p>\n<p><strong>\\S</strong> - Matches nonwhitespace.</p>\n<p><strong>\\d</strong> - Matches digits. Equivalent to [0-9].</p>\n\n\n " } } }