🧵Programming Languages and Techniques I

Regular Expression Patterns

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Regular expressions (regex) are one of the most powerful tools in your programming toolkit, and they show up everywhere—from form validation and data parsing to search-and-replace operations and log file analysis. When you're tested on regex, you're really being tested on pattern recognition, string manipulation logic, and your ability to translate human-readable requirements into precise symbolic notation. These skills transfer directly to real-world tasks like input validation, text processing, web scraping, and data cleaning.

The key to mastering regex isn't memorizing every symbol—it's understanding what problem each pattern element solves. Are you trying to match a specific character or any character? Do you need exactly three occurrences or "at least one"? Should the match appear at the start of a string or anywhere within it? Don't just memorize the syntax; know what category of matching problem each pattern addresses and when to reach for it.

Matching Specific Characters

These patterns answer the most basic question: what exact characters am I looking for? They form the foundation of every regex you'll write.

Basic Characters and Literals

Exact matching—literals match the precise characters you type, making hello match only the string "hello"
Case sensitivity applies by default, so A and a are treated as completely different characters
Foundation for all patterns—every complex regex builds on literal character matching as its core

Escaping Special Characters (\)

Backslash neutralizes special meaning—use \. to match an actual period instead of "any character"
Required for metacharacters including *, +, ?, (, ), [, ], {, }, ^, $, |, and \ itself
Common source of bugs—forgetting to escape special characters is one of the most frequent regex errors

Compare: Literals vs. Escaped Characters—both match exact characters, but escaping is required when that character has special regex meaning. If an exam question asks you to match a URL with periods and question marks, you'll need escaping: https://example\.com/page\?id=1.

Matching Character Categories

When you don't know the exact character but know its type, these patterns let you match by category rather than specific value.

Wildcards (.)

Matches any single character except newline—the most flexible single-character matcher
Use sparingly—wildcards can over-match and produce unexpected results
Combine with quantifiers for powerful patterns like .* (match anything of any length)

Character Classes []

Define custom character sets—[aeiou] matches any vowel, [0-9] matches any digit
Ranges use hyphens—[a-z] matches lowercase letters, [A-Za-z] matches all letters
Order doesn't matter inside brackets—[abc] and [cba] are functionally identical

Negated Character Classes [^]

Caret inside brackets means NOT—[^0-9] matches any character that isn't a digit
Useful for exclusion patterns—match "anything except these specific characters"
Don't confuse with anchor—^ means "start of string" outside brackets but "not" inside them

Shorthand Character Classes (\d, \w, \s)

\d matches digits—equivalent to [0-9], commonly used for phone numbers, IDs, and numeric data
\w matches word characters—letters, digits, and underscores; equivalent to [A-Za-z0-9_]
\s matches whitespace—spaces, tabs, and newlines; essential for parsing formatted text

Compare: [0-9] vs. \d—functionally identical, but shorthand is more readable and less error-prone. Use character classes when you need custom ranges like [a-f0-9] for hexadecimal; use shorthand for standard categories.

Controlling Repetition

These quantifiers answer: how many times should this pattern occur? They transform single-character matches into flexible length patterns.

Quantifiers (*, +, ?, {n}, {n,}, {n,m})

* means zero or more—ab*c matches "ac", "abc", "abbc", etc.
+ means one or more—ab+c matches "abc", "abbc", but NOT "ac"
Curly braces for precision—{3} means exactly 3, {2,5} means 2 to 5, {3,} means 3 or more

Compare: * vs. +—the critical difference is whether zero occurrences is valid. Use + when at least one match is required (like digits in a phone number); use * when the element is optional (like middle initials in a name).

Controlling Position

Anchors don't match characters—they match positions in the string. This is a conceptual shift that trips up many students.

Anchors (^ and $)

^ anchors to start—^Hello only matches "Hello" at the beginning of a string
$ anchors to end—world$ only matches "world" at the end of a string
Combine for exact matching—^exact$ matches only the string "exact" with nothing before or after

Compare: hello vs. ^hello$—the unanchored pattern matches "hello" anywhere (including in "say hello there"), while the anchored version only matches if the entire string is exactly "hello". Anchors are essential for input validation.

Building Complex Patterns

These constructs let you combine simpler patterns into sophisticated matching logic.

Grouping and Capturing ()

Parentheses create units—apply quantifiers to entire groups, so (ab)+ matches "ab", "abab", "ababab"
Captures store matches—the matched content can be referenced later for extraction or backreferences
Essential for extraction—use groups to pull specific parts from a larger match, like area codes from phone numbers

Alternation (|)

Pipe means OR—cat|dog matches either "cat" or "dog"
Combine with grouping—(cat|dog)s? matches "cat", "cats", "dog", or "dogs"
Left-to-right evaluation—the regex engine tries alternatives in order, stopping at the first match

Compare: [aeiou] vs. (a|e|i|o|u)—both match a single vowel, but character classes are more efficient for single characters. Use alternation when matching multi-character alternatives like (Monday|Tuesday|Wednesday).

Quick Reference Table

Concept	Best Examples
Exact character matching	Literals, Escaped characters (`\.`, `\?`)
Any single character	Wildcard (`.`)
Character categories	`[a-z]`, `[^0-9]`, `\d`, `\w`, `\s`
Zero or more repetition	`*`, `{0,}`
One or more repetition	`+`, `{1,}`
Optional elements	`?`, `{0,1}`
Exact count	`{n}`, `{n,m}`
Position matching	`^` (start), `$` (end)
Logical OR	`\|` (alternation)
Grouping/extraction	`()` (capturing groups)

Self-Check Questions

What's the difference between [^abc] and ^abc, and when would you use each?
You need to match a phone number that may or may not have an area code in parentheses. Which quantifier would make the area code optional, and how would you structure the pattern?
Compare \d+ and \d*—give an example input where one matches but the other doesn't.
If you're validating that a username contains only letters, numbers, and underscores, which shorthand character class would you use, and how would you anchor it to ensure the entire input is valid?
FRQ-style: Write a regex pattern that matches email addresses and explain which pattern elements handle each part (username, @ symbol, domain, period, extension). Identify where you'd use character classes, quantifiers, and escaping.

🧵Programming Languages and Techniques I

Regular Expression Patterns

Why This Matters

Matching Specific Characters

Basic Characters and Literals

Escaping Special Characters (\)

Matching Character Categories

Wildcards (.)

Character Classes []

Negated Character Classes [^]

Shorthand Character Classes (\d, \w, \s)

Controlling Repetition

Quantifiers (*, +, ?, {n}, {n,}, {n,m})

Controlling Position

Anchors (^ and $)

Building Complex Patterns

Grouping and Capturing ()

Alternation (|)

Quick Reference Table

Self-Check Questions

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

hs classes