Back to QuickRef

Regex

Regular expressions for pattern matching and text processing in various programming languages and tools.

regex pattern-matching text-processing programming search

Overview

Regular expressions (regex) are patterns used to match character combinations in strings. They’re essential for text processing, validation, searching, and data extraction across virtually all programming languages and tools.

Basic Syntax

Literal Characters

hello       # Matches "hello" exactly
123         # Matches "123" exactly

Meta Characters

.           # Any character except newline
^           # Start of string/line
$           # End of string/line
\           # Escape character
|           # OR operator
()          # Group
[]          # Character class
{}          # Quantifier

Character Classes

Predefined Classes

.           # Any character except newline
\d          # Any digit (0-9)
\D          # Any non-digit
\w          # Any word character (a-z, A-Z, 0-9, _)
\W          # Any non-word character
\s          # Any whitespace character
\S          # Any non-whitespace character

Custom Classes

[abc]       # Any of a, b, or c
[a-z]       # Any lowercase letter
[A-Z]       # Any uppercase letter
[0-9]       # Any digit
[^abc]      # Any character except a, b, or c
[a-zA-Z0-9] # Any alphanumeric character

Quantifiers

Basic Quantifiers

*           # 0 or more
+           # 1 or more
?           # 0 or 1 (optional)
{n}         # Exactly n times
{n,}        # n or more times
{n,m}       # Between n and m times

Examples

a*          # "", "a", "aa", "aaa", ...
a+          # "a", "aa", "aaa", ... (not empty)
a?          # "" or "a"
a{3}        # "aaa" exactly
a{2,4}      # "aa", "aaa", or "aaaa"

Anchors

Position Anchors

^           # Start of string/line
$           # End of string/line
\b          # Word boundary
\B          # Non-word boundary

Examples

^hello      # "hello" at start of line
world$      # "world" at end of line
\bcat\b     # "cat" as whole word
\Bcat\B     # "cat" not as whole word

Groups and Capturing

Groups

(abc)       # Capture group
(?:abc)     # Non-capturing group
(?P<name>abc) # Named group (Python)
(?<name>abc)  # Named group (C#, Java)

Backreferences

(cat)\1     # Matches "catcat"
(\w+)\s+\1  # Matches repeated words

Common Patterns

Email Validation

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Phone Numbers

^\+?1?[-.\s]?\(?[0-9]{3}\)?[-.\s]?[0-9]{3}[-.\s]?[0-9]{4}$

URLs

^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)$

IPv4 Address

^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$

Date Formats

# MM/DD/YYYY
^(0[1-9]|1[0-2])\/(0[1-9]|[12][0-9]|3[01])\/\d{4}$

# YYYY-MM-DD
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])$

Credit Card Numbers

# Visa
^4[0-9]{12}(?:[0-9]{3})?$

# MasterCard
^5[1-5][0-9]{14}$

# American Express
^3[47][0-9]{13}$

Language-Specific Usage

JavaScript

// Create regex
const regex = /pattern/flags;
const regex = new RegExp('pattern', 'flags');

// Test match
regex.test(string);

// Find matches
string.match(regex);
string.search(regex);
string.replace(regex, replacement);

Python

import re

# Compile regex
pattern = re.compile(r'regex_pattern')

# Match functions
re.match(pattern, string)    # Match at beginning
re.search(pattern, string)   # Find first match
re.findall(pattern, string)  # Find all matches
re.sub(pattern, replacement, string)  # Replace

Bash/grep

# Basic grep
grep 'pattern' file.txt

# Extended regex
grep -E 'pattern' file.txt
egrep 'pattern' file.txt

# Perl-compatible regex
grep -P 'pattern' file.txt

sed

# Replace with regex
sed 's/pattern/replacement/g' file.txt

# Extended regex
sed -E 's/pattern/replacement/g' file.txt

Flags/Modifiers

Common Flags

i           # Case insensitive
g           # Global (find all matches)
m           # Multiline (^ and $ match line breaks)
s           # Dot matches newline
x           # Extended (ignore whitespace)

Examples

/hello/i        # Case insensitive
/hello/g        # Global search
/hello/gi       # Case insensitive + global

Advanced Features

Lookahead/Lookbehind

(?=pattern)     # Positive lookahead
(?!pattern)     # Negative lookahead
(?<=pattern)    # Positive lookbehind
(?<!pattern)    # Negative lookbehind

Examples

\d+(?=px)       # Numbers followed by "px"
\d+(?!px)       # Numbers not followed by "px"
(?<=\$)\d+      # Numbers preceded by "$"
(?<!\$)\d+      # Numbers not preceded by "$"

Practical Examples

Extract Domain from Email

@([a-zA-Z0-9.-]+\.[a-zA-Z]{2,})

Find HTML Tags

<\/?[a-zA-Z][^>]*>

Match Quoted Strings

"([^"\\]|\\.)*"

Find CSS Colors

#[0-9a-fA-F]{3,6}

Extract URLs from Text

https?:\/\/[^\s]+

Match JSON Values

"[^"]*":\s*("[^"]*"|\d+|true|false|null)

Testing and Debugging

Online Tools

Command Line Testing

# Test with grep
echo "test string" | grep -E 'pattern'

# Test with sed
echo "test string" | sed 's/pattern/replacement/'

# Test with Python
python3 -c "import re; print(re.search(r'pattern', 'test string'))"

Performance Tips

Best Practices

  • Use specific characters instead of . when possible
  • Avoid nested quantifiers like (a+)+
  • Use non-capturing groups (?:...) when you don’t need the match
  • Anchor patterns with ^ and $ when appropriate
  • Use word boundaries \b for word matching

Common Pitfalls

  • Greedy vs Non-greedy: .* vs .*?
  • Backtracking: Avoid complex nested patterns
  • Case sensitivity: Remember to use i flag when needed
  • Escaping: Don’t forget to escape special characters

Quick Reference

Most Common Patterns

\d+             # One or more digits
\w+             # One or more word characters
\s+             # One or more whitespace
[a-zA-Z]+       # One or more letters
\b\w+\b         # Whole words
^.+$            # Entire line
.*              # Any characters (greedy)
.*?             # Any characters (non-greedy)

Escape Sequences

\.              # Literal dot
\*              # Literal asterisk
\+              # Literal plus
\?              # Literal question mark
\\              # Literal backslash
\(              # Literal parenthesis
\[              # Literal bracket
\{              # Literal brace
\|              # Literal pipe

See Also

Categories:
tools
Last updated: January 1, 2023