Give a man a RegEx and he’ll parse strings for a function. Teach a man to regex and he’ll be stuck in debugging hell for the rest of his life
RegEx is one of the “beautiful” things that I learn from zero every time I have to use it. To speed things up I use this quick reference guide for regular expressions (regex), including symbols, ranges, grouping, assertions and some sample patterns.
- Basic Syntax
- Position Matching
- Character Classes
- Special Characters
- Groups and Ranges
- Quantifiers
- Escape Sequences
- String Replacement
- Assertions
- POSIX
- Pattern Modifiers
Basic Syntax
/.../: Start and end regex delimiters|: Alternation(): Grouping
Position Matching
^: Start of string or start of line in multi-line mode\A: Start of string$: End of string or end of line in multi-line mode\Z: End of string\b: Word boundary\B: Not word boundary\<: Start of word\>: End of word
Character Classes
\s: Whitespace\S: Not whitespace\w: Word\W: Not word\d: Digit\D: Not digit\x: Hexadecimal digit\O: Octal digit
Special Characters
\n: Newline\r: Carriage return\t: Tab\v: Vertical tab\f: Form feed\xxx: Octal character xxx\xhh: Hex character hh
Groups and Ranges
.: Any character except newline (\n)(a|b): a or b(…): Group(?:…): Passive (non-capturing) group[abc]: a, b or c[^abc]: Not a, b or c[a-z]: Letters from a to z[A-Z]: Uppercase letters from A to Z[0-9]: Digits from 0 to 9
Note: Ranges are inclusive.
Quantifiers
*: 0 or more+: 1 or more?: 0 or 1{3}: Exactly 3{3,}: 3 or more{3,5}: 3, 4 or 5
Note: Quantifiers are greedy - they match as many times as possible. Add a ? after the quantifier to make it ungreedy.
Escape Sequences
-
\:Escape following character. Used to escape any of the following metacharacters: {}^$.*+?. \Q: Begin literal sequence\E: End literal sequence
String Replacement
$1: 1st group$2: 2nd group$n: nth group$`: Before matched string$': After matched string$+: Last matched string$&: Entire matched string
Note: Some regex implementations use \ instead of $.
Assertions
?=: Lookahead assertion?!: Negative lookahead?<=: Lookbehind assertion?!=, ?<!: Negative lookbehind?>: Once-only subexpression?(): Condition if-then?()|: Condition if-then-else?#: Comment
POSIX
[:upper:]: Uppercase letters[:lower:]: Lowercase letters[:alpha:]: All letters[:alnum:]: Digits and letters[:digit:]: Digits[:xdigit:]: Hexadecimal digits[:punct:]: Punctuation[:blank:]: Space and tab[:space:]: Blank characters[:cntrl:]: Control characters[:graph:]: Printed characters[:print:]: Printed characters and spaces[:word:]: Digits, letters and underscore
Pattern Modifiers
g: Global matchi: Case-insensitivem: Multi-line mode. Causes ^ and $ to also match the start/end of lines.s: Single-line mode. Causes . to match all, including line breaks.x: Allow comments and whitespace in patterne: Evaluate replacementU: Ungreedy mode