Skip to content
briskly.tools
· briskly / regex tester / python cheat sheet

· reference · python

Python regex cheat sheet.

Every method of Python's re module, every flag, and the small syntax differences from JavaScript regex. For the cross-language basics (metacharacters, quantifiers, groups), see the main regex cheat sheet.

The re module: methods

Python regex lives in the standard library re module. import reand you're done — no pip install. Every method below returns a Match object or None, except findall (list of strings) and finditer (iterator of Match).

re.search(pattern, string, flags=0)

Scan through the string and return the first match as a Match object, or None if nothing matches. The match can start anywhere in the string — doesn't have to be at the beginning.

re.search(r'\d+', 'abc 42 xyz 7')

→ <Match object; span=(4, 6), match='42'>

re.match(pattern, string, flags=0)

Like search but only matches at the start of the string. Returns None if the pattern doesn't match at position 0. Rarely what you want — prefer search unless you explicitly need the anchor behavior.

re.match(r'\d+', 'abc 42')

→ None (no digits at position 0)

re.fullmatch(pattern, string, flags=0)

Match the pattern against the entire string — same as search with ^...$ anchors. Added in Python 3.4.

re.fullmatch(r'\d+', '42abc')

→ None (doesn't match the whole string)

re.findall(pattern, string, flags=0)

Return all non-overlapping matches as a list of strings. If the pattern has groups, returns a list of tuples of those groups — gotcha: no full-match in the tuple.

re.findall(r'\d+', 'a 1 b 22 c 333')

→ ['1', '22', '333']

re.finditer(pattern, string, flags=0)

Like findall but returns an iterator of Match objects — useful when you need positions, spans, or named groups for every match.

[m.span() for m in re.finditer(r'\d+', 'a1 b22')]

→ [(1, 2), (4, 6)]

re.sub(pattern, repl, string, count=0, flags=0)

Replace every match of pattern in string with repl. repl can be a string with \1, \2, \g<name> backreferences, or a callable that returns the replacement for each match.

re.sub(r'(\w+) (\w+)', r'\2 \1', 'John Smith')

→ 'Smith John'

re.subn(pattern, repl, string, count=0, flags=0)

Same as sub but also returns the count of replacements made. Useful when you need to know if anything changed.

re.subn(r'\s+', ' ', 'a  b   c')

→ ('a b c', 2)

re.split(pattern, string, maxsplit=0, flags=0)

Split the string at every pattern match. If the pattern has groups, the captured text is included in the result list.

re.split(r'[,;]\s*', 'a, b; c,d')

→ ['a', 'b', 'c', 'd']

re.compile(pattern, flags=0)

Compile a pattern into a reusable regex object. Slightly faster for repeated use and lets you store flags with the pattern. All the module-level functions are also methods on compiled patterns.

p = re.compile(r'\d+')\np.findall('1 22 333')

→ ['1', '22', '333']

re.escape(string)

Escape all regex metacharacters in string so it can be used as a literal pattern. Essential when interpolating user input.

re.escape('1.2.3 (what?)')

→ '1\\.2\\.3\\ \\(what\\?\\)'

Flags

Pass flags as the last argument to any re function, or combine them with |: re.search(r'^end$', text, re.M | re.I).

FlagShortDescription
re.IGNORECASEre.IMatch letters regardless of case. Same as the i flag in JavaScript.
re.MULTILINEre.M^ and $ match at the start/end of each line. Same as the m flag in JavaScript.
re.DOTALLre.SThe . metacharacter also matches newlines. Equivalent to the s flag in JavaScript.
re.UNICODEre.UMake \w, \W, \b, \B, \d, \D, \s, \S match Unicode — default in Python 3.
re.VERBOSEre.XAllow whitespace and comments in the pattern for readability. No JavaScript equivalent — use a string concat hack.
re.ASCIIre.AMake \w, \W, \b, \B, \d, \D, \s, \S ASCII-only (opposite of UNICODE).
re.DEBUGPrint debug info about compiled pattern. For learning; don't ship with it on.

Match object methods

If a method returns a Match object (search, match, fullmatch, each element of finditer), you can pull data out of it:

  • m.group() / m.group(0) — the whole match as a string.
  • m.group(n) — the nth capture group. 1-indexed.
  • m.group('name') — a named capture group.
  • m.groups() — a tuple of all numbered groups.
  • m.groupdict() — a dict of all named groups.
  • m.span() / m.start() / m.end() — character positions of the match.
  • m.string — the original input string.

JavaScript vs. Python — syntax differences

The regex syntax is 95% the same across flavors, but Python has a handful of quirks worth memorizing. If you copy a regex from Stack Overflow and it works in JavaScript but not Python (or vice versa), the differences below are usually why.

TopicJavaScriptPython
Named capture groups(?<name>...)(?P<name>...) — the P is the Python quirk
Named backreference in pattern\k<name>(?P=name)
Named reference in substitution$<name>\g<name>
Numbered backreference in substitution$1\1
Raw strings for patternsUse /.../ literal or 'escape\\twice'Use r'...' raw strings to avoid \\ escape hell
Verbose mode (whitespace + comments)Not supported — concatenate strings manuallyre.VERBOSE flag (re.X) — whitespace ignored, # starts comments
Unicode by defaultRequires u flagDefault in Python 3 (re.UNICODE is on)

Always use raw strings

Python string literals interpret backslash escapes — '\n' is a newline, not the two characters backslash-n. Regex also uses backslashes. Without raw strings, you end up doubling every backslash:

# Without raw string — painful:
re.search('\\d+\\.\\d+', text)

# With raw string — clear:
re.search(r'\d+\.\d+', text)

Default to r'...' for every regex pattern in Python. Only drop the r when you explicitly want Python string escapes, which is rare.

Compile once, use many times

If you're running the same regex in a loop, compiling it once is a ~2× speedup:

EMAIL = re.compile(r'\b[\w.+-]+@[\w-]+\.[\w.-]+\b')

for line in lines:
    for match in EMAIL.finditer(line):
        process(match.group())

Python caches a small number of recently-used patterns internally, so the optimization is modest — but for readability, compiling with a named constant also makes the code self-documenting.

Common gotchas

  • findall returns strings, not Match objects. If you need positions or named groups, use finditer instead.
  • findall with groups returns tuples, not the full match. If your pattern has groups and you want the whole match, either use finditer or wrap the whole thing in an outer group.
  • match only tries at position 0. It's not the one you want for "find this anywhere in the string" — that's search.
  • re.sub's repl can be a callable. When the replacement needs logic (uppercase only if match starts with a capital, etc.), pass a function that takes a Match and returns a string.
  • Don't parse HTML with regex. Use BeautifulSoup or lxml. This is true in every language; Python just has particularly good alternatives.

Related