RegEx Cheat Sheet

Give a man a RegEx and he’ll parse strings for a function. Teach a man to regex and he’ll be stuck in debugging hell for the rest of his life

RegEx is one of the “beautiful” things that I learn from zero every time I have to use it. To speed things up I use this quick reference guide for regular expressions (regex), including symbols, ranges, grouping, assertions and some sample patterns.

Basic Syntax
Position Matching
Character Classes
Special Characters
Groups and Ranges
Quantifiers
Escape Sequences
String Replacement
Assertions
POSIX
Pattern Modifiers

Basic Syntax

/.../: Start and end regex delimiters
|: Alternation
(): Grouping

Position Matching

^: Start of string or start of line in multi-line mode
\A: Start of string
$: End of string or end of line in multi-line mode
\Z: End of string
\b: Word boundary
\B: Not word boundary
\<: Start of word
\>: End of word

Character Classes

\s: Whitespace
\S: Not whitespace
\w: Word
\W: Not word
\d: Digit
\D: Not digit
\x: Hexadecimal digit
\O: Octal digit

Special Characters

\n: Newline
\r: Carriage return
\t: Tab
\v: Vertical tab
\f: Form feed
\xxx: Octal character xxx
\xhh: Hex character hh

Groups and Ranges

.: Any character except newline (\n)
(a|b): a or b
(…): Group
(?:…): Passive (non-capturing) group
[abc]: a, b or c
[^abc]: Not a, b or c
[a-z]: Letters from a to z
[A-Z]: Uppercase letters from A to Z
[0-9]: Digits from 0 to 9

Note: Ranges are inclusive.

Quantifiers

*: 0 or more
+: 1 or more
?: 0 or 1
{3}: Exactly 3
{3,}: 3 or more
{3,5}: 3, 4 or 5

Note: Quantifiers are greedy - they match as many times as possible. Add a ? after the quantifier to make it ungreedy.

Escape Sequences

\:Escape following character. Used to escape any of the following metacharacters: {}^$. *+?.
\Q: Begin literal sequence
\E: End literal sequence

String Replacement

$1: 1st group
$2: 2nd group
$n: nth group
$`: Before matched string
$': After matched string
$+: Last matched string
$&: Entire matched string

Note: Some regex implementations use \ instead of $.

Assertions

?=: Lookahead assertion
?!: Negative lookahead
?<=: Lookbehind assertion
?!=, ?<!: Negative lookbehind
?>: Once-only subexpression
?(): Condition if-then
?()|: Condition if-then-else
?#: Comment

POSIX

[:upper:]: Uppercase letters
[:lower:]: Lowercase letters
[:alpha:]: All letters
[:alnum:]: Digits and letters
[:digit:]: Digits
[:xdigit:]: Hexadecimal digits
[:punct:]: Punctuation
[:blank:]: Space and tab
[:space:]: Blank characters
[:cntrl:]: Control characters
[:graph:]: Printed characters
[:print:]: Printed characters and spaces
[:word:]: Digits, letters and underscore

Pattern Modifiers

g: Global match
i: Case-insensitive
m: Multi-line mode. Causes ^ and $ to also match the start/end of lines.
s: Single-line mode. Causes . to match all, including line breaks.
x: Allow comments and whitespace in pattern
e: Evaluate replacement
U: Ungreedy mode

Written on May 16, 2023

◀ Back to the Pensieve