Python Regex Cheat Sheet: re Module, Methods, and Patterns

The re module: methods

Python regex lives in the standard library re module. import reand you're done. No pip install. Every method below returns a Match object or None, except findall (list of strings) and finditer (iterator of Match).

re.search(pattern, string, flags=0)

Scan through the string and return the first match as a Match object, or None if nothing matches. The match can start anywhere in the string; it doesn't have to be at the beginning.

re.search(r'\d+', 'abc 42 xyz 7')

→ <Match object; span=(4, 6), match='42'>

re.match(pattern, string, flags=0)

Like search but only matches at the start of the string. Returns None if the pattern doesn't match at position 0. Rarely what you want; prefer search unless you explicitly need the anchor behavior.

re.match(r'\d+', 'abc 42')

→ None (no digits at position 0)

re.fullmatch(pattern, string, flags=0)

Match the pattern against the entire string, the same as search with ^...$ anchors. Added in Python 3.4.

re.fullmatch(r'\d+', '42abc')

→ None (doesn't match the whole string)

re.findall(pattern, string, flags=0)

Return all non-overlapping matches as a list of strings. If the pattern has groups, returns a list of tuples of those groups. Gotcha: no full-match in the tuple.

re.findall(r'\d+', 'a 1 b 22 c 333')

→ ['1', '22', '333']

re.finditer(pattern, string, flags=0)

Like findall but returns an iterator of Match objects. Useful when you need positions, spans, or named groups for every match.

[m.span() for m in re.finditer(r'\d+', 'a1 b22')]

→ [(1, 2), (4, 6)]

re.sub(pattern, repl, string, count=0, flags=0)

Replace every match of pattern in string with repl. repl can be a string with \1, \2, \g<name> backreferences, or a callable that returns the replacement for each match.

re.sub(r'(\w+) (\w+)', r'\2 \1', 'John Smith')

→ 'Smith John'

re.subn(pattern, repl, string, count=0, flags=0)

Same as sub but also returns the count of replacements made. Useful when you need to know if anything changed.

re.subn(r'\s+', ' ', 'a  b   c')

→ ('a b c', 2)

re.split(pattern, string, maxsplit=0, flags=0)

Split the string at every pattern match. If the pattern has groups, the captured text is included in the result list.

re.split(r'[,;]\s*', 'a, b; c,d')

→ ['a', 'b', 'c', 'd']

re.compile(pattern, flags=0)

Compile a pattern into a reusable regex object. Slightly faster for repeated use and lets you store flags with the pattern. All the module-level functions are also methods on compiled patterns.

p = re.compile(r'\d+')\np.findall('1 22 333')

→ ['1', '22', '333']

re.escape(string)

Escape all regex metacharacters in string so it can be used as a literal pattern. Essential when interpolating user input.

re.escape('1.2.3 (what?)')

→ '1\\.2\\.3\\ \$what\\?\$'

Flags

Pass flags as the last argument to any re function, or combine them with |: re.search(r'^end$', text, re.M | re.I).

Flag	Short	Description
re.IGNORECASE	re.I	Match letters regardless of case. Same as the i flag in JavaScript.
re.MULTILINE	re.M	^ and $ match at the start/end of each line. Same as the m flag in JavaScript.
re.DOTALL	re.S	The . metacharacter also matches newlines. Equivalent to the s flag in JavaScript.
re.UNICODE	re.U	Make \w, \W, \b, \B, \d, \D, \s, \S match Unicode. Default in Python 3.
re.VERBOSE	re.X	Allow whitespace and comments in the pattern for readability. No JavaScript equivalent; use a string concat hack.
re.ASCII	re.A	Make \w, \W, \b, \B, \d, \D, \s, \S ASCII-only (opposite of UNICODE).
re.DEBUG	n/a	Print debug info about compiled pattern. For learning; don't ship with it on.

Match object methods

If a method returns a Match object (search, match, fullmatch, each element of finditer), you can pull data out of it:

m.group() / m.group(0): the whole match as a string.
m.group(n): the nth capture group. 1-indexed.
m.group('name'): a named capture group.
m.groups(): a tuple of all numbered groups.
m.groupdict(): a dict of all named groups.
m.span() / m.start() / m.end(): character positions of the match.
m.string: the original input string.

JavaScript vs. Python: syntax differences

The regex syntax is 95% the same across flavors, but Python has a handful of quirks worth memorizing. If you copy a regex from Stack Overflow and it works in JavaScript but not Python (or vice versa), the differences below are usually why.

Topic	JavaScript	Python
Named capture groups	(?<name>...)	(?P<name>...). The P is the Python quirk
Named backreference in pattern	\k<name>	(?P=name)
Named reference in substitution	$<name>	\g<name>
Numbered backreference in substitution	$1	\1
Raw strings for patterns	Use /.../ literal or 'escape\\twice'	Use r'...' raw strings to avoid \\ escape hell
Verbose mode (whitespace + comments)	Not supported; concatenate strings manually	re.VERBOSE flag (re.X). Whitespace ignored, # starts comments
Unicode by default	Requires u flag	Default in Python 3 (re.UNICODE is on)

Always use raw strings

Python string literals interpret backslash escapes. '\n' is a newline, not the two characters backslash-n. Regex also uses backslashes. Without raw strings, you end up doubling every backslash:

# Without raw string (painful):
re.search('\\d+\\.\\d+', text)

# With raw string (clear):
re.search(r'\d+\.\d+', text)

Default to r'...' for every regex pattern in Python. Only drop the r when you explicitly want Python string escapes, which is rare.

Compile once, use many times

If you're running the same regex in a loop, compiling it once is a ~2× speedup:

EMAIL = re.compile(r'\b[\w.+-]+@[\w-]+\.[\w.-]+\b')

for line in lines:
    for match in EMAIL.finditer(line):
        process(match.group())

Python caches a small number of recently-used patterns internally, so the optimization is modest. For readability, compiling with a named constant also makes the code self-documenting.

Common gotchas

findall returns strings, not Match objects. If you need positions or named groups, use finditer instead.
findall with groups returns tuples, not the full match. If your pattern has groups and you want the whole match, either use finditer or wrap the whole thing in an outer group.
match only tries at position 0. It's not the one you want for "find this anywhere in the string." That's search.
re.sub's repl can be a callable. When the replacement needs logic (uppercase only if match starts with a capital, etc.), pass a function that takes a Match and returns a string.
Don't parse HTML with regex. Use BeautifulSoup or lxml. This is true in every language; Python just has particularly good alternatives.

· Main regex cheat sheet: metacharacters, quantifiers, groups, look-around. The cross-language basics.
· Live regex tester: paste and run patterns. ECMAScript flavor.
· Python re docs (python.org): the official reference. Comprehensive but dense.

Python regex cheat sheet.