Regular expressions

A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. Regular expressions are widely used in UNIX world.

The module re provides full support for Perl-like regular expressions in Python. The re module raises the exception re.error if an error occurs while compiling or using a regular expression.

The match Function

Match syntax

re.match(pattern, string, flags=0)

pattern - This is the regular expression to be matched.

string - This is the string, which would be searched to match the pattern at the beginning of string.

flags - You can specify different flags using bitwise OR (|).

The re.match function returns a match object on success, None on failure. We usegroup(num) or groups() function of match object to get matched expression.

group(num=0) - This method returns entire match (or specific subgroup num)

groups() - This method returns all matching subgroups in a tuple (empty if there weren't any)

Example:

>>> import re
>>>
>>> line = "Cats are smarter than dogs"
>>> matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I)
>>> matchObj
<re.Match object; span=(0, 26), match='Cats are smarter than dogs'>
>>> if matchObj:
...  print("matchObj.group() : ", matchObj.group())
...  print("matchObj.group(1) : ", matchObj.group(1))
...  print("matchObj.group(2) : ", matchObj.group(2))
... else:
...  print("No match")
...
matchObj.group() :  Cats are smarter than dogs
matchObj.group(1) :  Cats
matchObj.group(2) :  smarter

The search Function

This function searches for first occurrence of RE pattern within string with optional flags.

re.search(pattern, string, flags=0)

The re.search function returns a match object on success, none on failure. We use group(num) or groups() function of match object to get matched expression.

Example:

>>> import re
>>>
>>> line = "Cats are smarter than dogs";
>>>
>>> searchObj = re.search( r'(.*) are (.*?) .*', line, re.M|re.I)
>>>
>>> if searchObj:
...  print("searchObj.group() : ", searchObj.group())
...  print("searchObj.group(1) : ", searchObj.group(1))
...  print("searchObj.group(2) : ", searchObj.group(2))
... else:
...  print("Nothing found")
...
searchObj.group() :  Cats are smarter than dogs
searchObj.group(1) :  Cats
searchObj.group(2) :  smarter

Matching Versus Searching

Python offers two different primitive operations based on regular expressions: match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string (this is what Perl does by default).

Example:

>>> import re
>>>
>>> line = "Cats are smarter than dogs";
>>>
>>> matchObj = re.match( r'dogs', line, re.M|re.I)
>>>
>>> if matchObj:
...  print("match --> matchObj.group() : ", matchObj.group())
... else:
...  print("No match")
...
No match
>>> searchObj = re.search( r'dogs', line, re.M|re.I)
>>> if searchObj:
...  print("search --> searchObj.group() : ", searchObj.group())
... else:
...  print("Nothing found")
...
search --> searchObj.group() :  dogs

Search and Replace

One of the most important re methods that use regular expressions is sub.

re.sub(pattern, repl, string, max=0)

This method replaces all occurrences of the RE pattern in string with repl, substituting all occurrences unless max provided. This method returns modified string.

>>> import re
>>>
>>> phone = "2004-959-559 # This is Phone Number"
>>> num = re.sub(r'#.*$', "", phone)
>>> print("Phone Num : ", num)
Phone Num :  2004-959-559
>>> num = re.sub(r'\D', "", phone)
>>> print("Phone Num : ", num)
Phone Num :  2004959559

First replace deletes Python-style comments, while second remove anything other than digits.

Regular Expression Modifiers, Option Flags

re.I - Performs case-insensitive matching.

re.M - Makes $ match the end of a line (not just the end of the string) and makes ^ match the start of any line (not just the start of the string).

re.S - Makes a period (dot) match any character, including a newline.

re.U - Interprets letters according to the Unicode character set. This flag affects the behavior of \w, \W, \b, \B.

Regular Expression Patterns

^ - Matches beginning of line.

$ - Matches end of line.

. - Matches any single character except newline. Using m option allows it to match newline as well.

\w - Matches word characters.

\W - Matches word characters.

\s - Matches whitespace. Equivalent to [\t\n\r\f].

\S - Matches nonwhitespace.

\d - Matches digits. Equivalent to [0-9].

{
  "data": {
    "sessionMaterial": {
      "id": "session-material:2019/tieto-ostrava-jaro:regular-expressions:1",
      "title": "Regular expressions",
      "html": "\n          \n    \n\n    <h2>Regular expressions</h2>\n<p>A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. Regular expressions are widely used in UNIX world.</p>\n<p>The module re provides full support for Perl-like regular expressions in Python. The re module raises the exception re.error if an error occurs while compiling or using a regular expression.</p>\n<h3>The match Function</h3>\n<p>Match syntax</p>\n<div class=\"highlight\"><pre><code>re.match(pattern, string, flags=0)</code></pre></div><p><strong>pattern</strong> - This is the regular expression to be matched.</p>\n<p><strong>string</strong> - This is the string, which would be searched to match the pattern at the beginning of string.</p>\n<p><strong>flags</strong> - You can specify different flags using bitwise OR (|).</p>\n<p>The re.match function returns a match object on success, None on failure. We usegroup(num) or groups() function of match object to get matched expression.</p>\n<p><strong>group(num=0)</strong> - This method returns entire match (or specific subgroup num)</p>\n<p><strong>groups()</strong> - This method returns all matching subgroups in a tuple (empty if there weren&apos;t any)</p>\n<p>Example:</p>\n<div class=\"highlight\"><pre><code>&gt;&gt;&gt; import re\n&gt;&gt;&gt;\n&gt;&gt;&gt; line = &quot;Cats are smarter than dogs&quot;\n&gt;&gt;&gt; matchObj = re.match( r&apos;(.*) are (.*?) .*&apos;, line, re.M|re.I)\n&gt;&gt;&gt; matchObj\n&lt;re.Match object; span=(0, 26), match=&apos;Cats are smarter than dogs&apos;&gt;\n&gt;&gt;&gt; if matchObj:\n...  print(&quot;matchObj.group() : &quot;, matchObj.group())\n...  print(&quot;matchObj.group(1) : &quot;, matchObj.group(1))\n...  print(&quot;matchObj.group(2) : &quot;, matchObj.group(2))\n... else:\n...  print(&quot;No match&quot;)\n...\nmatchObj.group() :  Cats are smarter than dogs\nmatchObj.group(1) :  Cats\nmatchObj.group(2) :  smarter</code></pre></div><h3>The search Function</h3>\n<p>This function searches for first occurrence of RE pattern within string with optional flags.</p>\n<div class=\"highlight\"><pre><code>re.search(pattern, string, flags=0)</code></pre></div><p>The re.search function returns a match object on success, none on failure. We use group(num) or groups() function of match object to get matched expression.</p>\n<p>Example:</p>\n<div class=\"highlight\"><pre><code>&gt;&gt;&gt; import re\n&gt;&gt;&gt;\n&gt;&gt;&gt; line = &quot;Cats are smarter than dogs&quot;;\n&gt;&gt;&gt;\n&gt;&gt;&gt; searchObj = re.search( r&apos;(.*) are (.*?) .*&apos;, line, re.M|re.I)\n&gt;&gt;&gt;\n&gt;&gt;&gt; if searchObj:\n...  print(&quot;searchObj.group() : &quot;, searchObj.group())\n...  print(&quot;searchObj.group(1) : &quot;, searchObj.group(1))\n...  print(&quot;searchObj.group(2) : &quot;, searchObj.group(2))\n... else:\n...  print(&quot;Nothing found&quot;)\n...\nsearchObj.group() :  Cats are smarter than dogs\nsearchObj.group(1) :  Cats\nsearchObj.group(2) :  smarter</code></pre></div><h3>Matching Versus Searching</h3>\n<p>Python offers two different primitive operations based on regular expressions: match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string (this is what Perl does by default).</p>\n<p>Example:</p>\n<div class=\"highlight\"><pre><code>&gt;&gt;&gt; import re\n&gt;&gt;&gt;\n&gt;&gt;&gt; line = &quot;Cats are smarter than dogs&quot;;\n&gt;&gt;&gt;\n&gt;&gt;&gt; matchObj = re.match( r&apos;dogs&apos;, line, re.M|re.I)\n&gt;&gt;&gt;\n&gt;&gt;&gt; if matchObj:\n...  print(&quot;match --&gt; matchObj.group() : &quot;, matchObj.group())\n... else:\n...  print(&quot;No match&quot;)\n...\nNo match\n&gt;&gt;&gt; searchObj = re.search( r&apos;dogs&apos;, line, re.M|re.I)\n&gt;&gt;&gt; if searchObj:\n...  print(&quot;search --&gt; searchObj.group() : &quot;, searchObj.group())\n... else:\n...  print(&quot;Nothing found&quot;)\n...\nsearch --&gt; searchObj.group() :  dogs</code></pre></div><h3>Search and Replace</h3>\n<p>One of the most important re methods that use regular expressions is sub.</p>\n<div class=\"highlight\"><pre><code>re.sub(pattern, repl, string, max=0)</code></pre></div><p>This method replaces all occurrences of the RE pattern in string with repl, substituting all occurrences unless max provided. This method returns modified string.</p>\n<div class=\"highlight\"><pre><code>&gt;&gt;&gt; import re\n&gt;&gt;&gt;\n&gt;&gt;&gt; phone = &quot;2004-959-559 # This is Phone Number&quot;\n&gt;&gt;&gt; num = re.sub(r&apos;#.*$&apos;, &quot;&quot;, phone)\n&gt;&gt;&gt; print(&quot;Phone Num : &quot;, num)\nPhone Num :  2004-959-559\n&gt;&gt;&gt; num = re.sub(r&apos;\\D&apos;, &quot;&quot;, phone)\n&gt;&gt;&gt; print(&quot;Phone Num : &quot;, num)\nPhone Num :  2004959559</code></pre></div><p>First replace deletes Python-style comments, while second remove anything other than digits.</p>\n<h3>Regular Expression Modifiers, Option Flags</h3>\n<p><strong>re.I</strong> - Performs case-insensitive matching.</p>\n<p><strong>re.M</strong> - Makes $ match the end of a line (not just the end of the string) and makes ^ match the start of any line (not just the start of the string).</p>\n<p><strong>re.S</strong> - Makes a period (dot) match any character, including a newline.</p>\n<p><strong>re.U</strong> - Interprets letters according to the Unicode character set. This flag affects the behavior of \\w, \\W, \\b, \\B.</p>\n<h3>Regular Expression Patterns</h3>\n<p><strong>^</strong> - Matches beginning of line.</p>\n<p><strong>$</strong> - Matches end of line.</p>\n<p><strong>.</strong> - Matches any single character except newline. Using m option allows it to match newline as well.</p>\n<p><strong>\\w</strong> - Matches word characters.</p>\n<p><strong>\\W</strong> - Matches word characters.</p>\n<p><strong>\\s</strong> - Matches whitespace. Equivalent to [\\t\\n\\r\\f].</p>\n<p><strong>\\S</strong> - Matches nonwhitespace.</p>\n<p><strong>\\d</strong> - Matches digits. Equivalent to [0-9].</p>\n\n\n        "
    }
  }
}