Posted · 9 min read
Regex for Non-Developers: A Practical Guide
A friendly, jargon-light regex tutorial for writers, analysts, marketers, and anyone who lives in Find & Replace. Learn the syntax that matters and steal ten ready-to-use recipes.
Why a non-developer should care about regex
If you have ever opened the Find and Replace box in Word, Google Docs, Notion, a spreadsheet, or your favorite text editor and thought "there has to be a faster way to do this," you are exactly the right reader for this guide. Regex, short for regular expression, is a tiny pattern language that turns Find and Replace from a blunt instrument into a scalpel. Instead of searching for a single fixed word, you describe the shape of what you are looking for: "any phone number," "every line that starts with a date," "all email addresses in this messy export."
Regex has a reputation for being intimidating, mostly because tutorials written for programmers throw twenty symbols at you in the first paragraph. The truth is that you can be wildly productive with about ten characters and a handful of patterns. This article walks you through that core, then hands you ten copy-paste recipes you can drop into the Find and Replace box of almost any modern app today.
Quick note on where regex works: it is supported in Microsoft Word (with "Use wildcards" or via the newer regex search), Google Docs ("Match using regular expressions"), Notion (limited), VS Code, Sublime Text, BBEdit, Obsidian, Google Sheets through REGEXMATCH and REGEXREPLACE, Excel via Power Query and the new REGEX functions, and pretty much every code editor and online testing tool. Some apps use slightly different dialects, but ninety percent of what you learn here works everywhere.
What exactly is a regex?
A regular expression is a pattern. You hand the pattern to a search engine, and it returns every chunk of text that matches that pattern. The simplest possible regex is a literal word. The pattern cat finds the letters c, a, t in that order. So far, so boring, that is just normal Find and Replace.
The magic starts when you mix in special characters called metacharacters. These do not match themselves; they describe categories or rules. A dot, for example, means "any single character." The pattern c.t now matches cat, cot, cut, c5t, and even c#t. Suddenly one pattern does the work of dozens of searches.
That is the whole mental model. A regex is a literal string of text where some characters are upgraded to mean something more general. Learn what those upgraded characters do and you can read and write nearly any pattern you encounter.
The building blocks worth memorizing
- . (dot) matches any single character except a newline.
- * means "zero or more of the previous thing." So a* matches an empty string, a, aa, aaa, and so on.
- + means "one or more of the previous thing." a+ matches a, aa, aaa, but not an empty string.
- ? means "zero or one," i.e. optional. colou?r matches both color and colour.
- ^ anchors the pattern to the start of a line. ^Hello finds Hello only when it begins a line.
- $ anchors to the end of a line. \.$ finds every line that ends with a period.
- [abc] is a character class. It matches one character that is a, b, or c. [a-z] matches any lowercase letter, [0-9] any digit, [A-Za-z0-9] any letter or digit.
- [^abc] with a caret inside the brackets means "any character except a, b, or c."
- \d is shorthand for any digit, \w for any word character (letters, digits, underscore), \s for any whitespace (space, tab, newline). Their uppercase counterparts \D, \W, \S mean the opposite.
- Parentheses ( ) create a group, both for applying quantifiers to several characters at once and for capturing text you want to reuse in a replacement.
- The pipe | means OR. cat|dog matches cat or dog.
- A backslash \ escapes a special character so it matches literally. To find an actual dot, write \. To find a literal parenthesis, write \(.
Anchors, quantifiers, and the greedy trap
Anchors (^ and $) are the secret to writing precise patterns. Without them, your regex floats anywhere in the text and tends to match more than you wanted. With them, you pin the pattern to the start or end of a line, which is usually exactly what you mean when you say "every line that starts with a number."
Quantifiers (*, +, ?, and the curly brace form like {2,4}) control how many times the previous thing repeats. The thing to know is that they are greedy by default: they grab as much as they can. If you write <.+> and run it on the text <b>hi</b>, you might expect it to match <b> and </b> separately. It will actually match the entire <b>hi</b> in one go because + is greedy. Add a question mark, <.+?>, and the quantifier becomes lazy, matching as little as possible. This single trick saves an enormous amount of debugging time.
Groups and backreferences for find-and-replace
Parentheses do double duty. First, they let you quantify a chunk: (ab)+ matches ab, abab, ababab. Second, and far more useful, they capture whatever they match into a numbered slot you can refer to in the replacement field. The first set of parentheses becomes $1 in most editors (or \1 in some), the second becomes $2, and so on.
This is how you reorder text. Suppose your spreadsheet has names written as Last, First and you want First Last. Search for ([A-Za-z]+), ([A-Za-z]+) and replace with $2 $1. Done, in one shot, across thousands of rows. Almost every recipe later in this article uses the same trick, so it is worth getting comfortable with the idea: parentheses on the left, dollar-numbers on the right.
Recipe 1: extract every email address from messy text
Email addresses are ubiquitous in exports, signature blocks, and CRM dumps. This pattern is a sensible default. It is not RFC-perfect (no regex really is), but it catches every realistic address in normal documents.
[\w.+-]+@[\w-]+\.[\w.-]+Recipe 2: find phone numbers in any common format
Phone numbers are a wonderful example of how regex tames messy input. The pattern below catches numbers written as 555-123-4567, (555) 123 4567, +1 555.123.4567, and most variants in between. Adjust the leading + and country code if you only need local numbers.
\+?\d{1,3}?[\s.-]?\(?\d{2,4}\)?[\s.-]?\d{3,4}[\s.-]?\d{3,4}Recipe 3: change date format from YYYY-MM-DD to DD/MM/YYYY
This is the classic groups-and-backreferences move. Capture each date part, then reorder it in the replacement field. In the Find box paste the pattern below; in the Replace box paste $3/$2/$1.
(\d{4})-(\d{2})-(\d{2})Recipe 4: remove all blank lines from a document
Long pasted text often comes with extra empty lines that double or triple the page count. The pattern below matches a line that contains nothing but optional whitespace and a newline. Replace with nothing (leave the Replace box empty) and your document collapses to single-spaced lines.
^\s*\nRecipe 5: collapse double, triple, or runaway spaces into one
OCR exports, copied PDFs, and old Word documents are notorious for sprinkling double spaces between words. Find two or more consecutive spaces and replace with a single space.
{2,}Recipe 6: pull every URL out of a block of text
Need to grab every link from an article, an email export, or a chat log? This pattern catches http and https URLs, including paths and query strings. Combine it with your editor's "copy all matches" feature, or paste the captured matches into a fresh document.
https?://[\w.-]+(?:/[\w./?=&%#-]*)?Recipe 7: capitalize the first letter of every name in a list
If you have a column full of names typed in lowercase (jane doe, mark twain) you can fix the casing in two passes. First, find the pattern below and capture the first letter of each word. Most editors then let you replace with \u$1 (uppercase the first capture). If yours does not, run the list through a dedicated case converter and bring it back. Multilities ships a small case-converter that handles this in one click.
\b([a-z])Recipe 8: keep only lines that contain a number
Some editors let you delete every line that does not match a regex. Combined with the pattern below, you can strip a long log file or notes document down to only the rows containing a number, which is wonderful for cleaning up bank statements pasted from a PDF.
^.*\d+.*$Recipe 9: standardize whitespace at the end of every line
Trailing spaces are invisible but they break diff tools, email signatures, and code comments. Find them and replace with nothing. The pattern matches one or more spaces or tabs that sit immediately before the end of a line.
[ \t]+$Recipe 10: extract hashtags from a social-media export
Marketing teams export tweets, LinkedIn posts, and Instagram captions and want to count which hashtags appeared. The pattern below finds a # immediately followed by one or more word characters, which catches every realistic hashtag without dragging in the surrounding punctuation.
#\w+How to actually try these in your tool
In Google Docs, open Edit, then Find and replace, and tick "Match using regular expressions." In Microsoft Word, open the Find and Replace dialog, expand it, and tick "Use wildcards" (Word's wildcard dialect is similar but not identical, so the patterns above may need a tweak there). In Notion, regex is supported in some database properties and via integrations, but the find bar itself is literal.
In Google Sheets, wrap the pattern in REGEXMATCH, REGEXEXTRACT, or REGEXREPLACE: =REGEXEXTRACT(A2, "[\w.+-]+@[\w-]+\.[\w.-]+") pulls the email out of cell A2. Excel's newer REGEXEXTRACT, REGEXREPLACE, and REGEXTEST functions behave similarly. In VS Code, Sublime, and most code editors, click the .* icon in the search bar to switch the search into regex mode.
When a pattern misbehaves, do not stare at it. Drop it into a regex tester that highlights matches as you type. Multilities offers a free /tools/regex-tester for exactly this, and a /tools/find-replace tool that runs your patterns over a block of pasted text without touching the original document. Iterating in a tester takes seconds; iterating inside Word with Undo takes minutes.
Common mistakes and how to avoid them
- Forgetting to escape the dot. If you want a literal period (in a domain name, file extension, or sentence), write \. The bare dot matches anything.
- Using .* when you mean \S+ or [^,]+. The dot-star pattern is greedy and often eats more than you wanted, especially across commas, quotes, or HTML tags.
- Anchoring with ^ and $ but forgetting that ^ and $ work per-line in some tools and per-document in others. If you get unexpected matches, check whether your editor has a "multiline" or "dotall" toggle.
- Mixing dialects. JavaScript, Python, PCRE, Word wildcards, and POSIX regex all share most syntax but differ on edge cases like lookarounds and named groups. When something stops working in one app, the dialect is usually the cause.
- Not using parentheses for replacement. If you want to keep part of what you matched, you have to capture it. Without parentheses you cannot reference anything with $1 or \1.
A short workflow that will make you fast
When you need to clean a document, do not start by writing the perfect regex. Start by selecting two or three example rows of the messy text and pasting them into a regex tester. Type a literal version of one example, then upgrade one chunk at a time: replace the digits with \d+, replace the variable name with \w+, replace the punctuation that is allowed to vary with a character class. After each upgrade, check that the test text still matches and that you have not started catching things you did not want.
Once the pattern is right, copy it into the real document's Find box and run a single replacement to see if it behaves the same way at scale. If the document is important, work on a copy and use Undo aggressively. Within a week of doing this two or three times you will start writing patterns from memory for the cleanups you do most often, and the first thing you reach for in any messy export will no longer be a manual click-click-click but a tiny pattern that does it all at once.
Where to go from here
The patterns and recipes above will cover the vast majority of everyday text-cleaning, extraction, and reformatting tasks. When you are ready to go deeper, look up lookaheads ((?=...)), lookbehinds ((?<=...)), non-capturing groups ((?:...)), and named captures ((?<year>\d{4})). They are not necessary for daily work, but they unlock more elegant patterns when you encounter genuinely tricky text.
Regex rewards short, frequent practice more than any other technical skill. Keep a notes file with the five recipes you reach for most, paste new ones in as you discover them, and within a few months you will be the person at the office whom everyone asks when their spreadsheet is a mess. That is a surprisingly nice place to be.