{"id":5439,"date":"2022-04-14T10:49:33","date_gmt":"2022-04-14T17:49:33","guid":{"rendered":"https:\/\/coderpad.io\/?p=5439"},"modified":"2025-10-08T08:53:51","modified_gmt":"2025-10-08T15:53:51","slug":"the-complete-guide-to-regular-expressions-regex","status":"publish","type":"post","link":"https:\/\/coderpad.io\/blog\/development\/the-complete-guide-to-regular-expressions-regex\/","title":{"rendered":"The Complete Guide to Regular Expressions (Regex)"},"content":{"rendered":"\n<p>A Regular Expression \u2013 or regex for short\u2013 is a syntax that allows you to match strings with specific patterns. Think of it as a suped-up text search shortcut, but a regular expression adds the ability to use quantifiers, pattern collections, special characters, and capture groups to create extremely advanced search patterns.<br>Regex can be used any time you need to query string-based data, such as:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Analyzing command line output<\/li>\n\n\n\n<li>Parsing user input<\/li>\n\n\n\n<li>Examining server or program logs<\/li>\n\n\n\n<li>Handling text files with a consistent syntax, like a CSV<\/li>\n\n\n\n<li>Reading configuration files<\/li>\n\n\n\n<li>Searching and refactoring code<\/li>\n<\/ul>\n\n\n\n<p>While doing all of these is <em>theoretically<\/em> possible without regex, when regexes hit the scene they act as a superpower for doing all of these tasks.<\/p>\n\n\n\n<p>In this guide we&#8217;ll cover:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"#what-does-a-regex-look-like\">What does a regex look like?<\/a><\/li>\n\n\n\n<li><a href=\"#how-to-read-and-write-regexes\">How to read and write a regex<\/a>\n<ul class=\"wp-block-list\">\n<li><a href=\"#quantifiers\">What&#8217;s a &#8220;quantifier&#8221;?<\/a><\/li>\n\n\n\n<li><a href=\"#pattern-collections\">What&#8217;s a &#8220;pattern collection&#8221;?<\/a><\/li>\n\n\n\n<li><a href=\"#general-tokens\">What&#8217;s a &#8220;regex token&#8221;?<\/a><\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><a href=\"#how-to-use-a-regex\">How to use a regex<\/a><\/li>\n\n\n\n<li><a href=\"http:\/\/flags\">What&#8217;s a &#8220;regex fla<\/a><a href=\"#flags\" data-type=\"internal\" data-id=\"#flags\">g<\/a><a href=\"http:\/\/flags\">&#8220;?<\/a><\/li>\n\n\n\n<li><a href=\"#groups\">What&#8217;s a &#8220;regex group&#8221;?<\/a><\/li>\n<\/ul>\n\n\n<aside class=\"\n    cta-banner\n        \"\ndata-block-name=\"cta-banner\">\n    <div class=\"inner\">\n        <div class=\"content\">\n                            <h2 class=\"headline\">Download Our Regex Cheat Sheet<\/h2>\n            \n                            <div class=\"cta-buttons\">\n                                    <a href=\"\/regular-expression-cheat-sheet\/\" class=\"button  js-cta--download\"  data-ga-category=\"CTA\" data-ga-label=\"Download Our Regex Cheat Sheet|Download\">Download<\/a>\n                                <\/div>\n                    <\/div>\n            <\/div>\n<\/aside>\n\n\n\n<div style=\"height:32px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"what-does-a-regex-look-like\">What does a regex look like?<\/h2>\n\n\n\n<p>In its simplest form, a regex in usage might look something like this:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/d2h1bfu6zrdxog.cloudfront.net\/wp-content\/uploads\/2022\/04\/img_625491e7a5b7c.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">We&#8217;re using a regular expression \/Test\/ to look for the word &#8220;Test&#8221;<\/figcaption><\/figure>\n\n\n\n<blockquote class=\"wp-block-quote\">\n<p>This screenshot is of the <a href=\"https:\/\/regex101.com\/\" target=\"_blank\" rel=\"noopener\">regex101 website<\/a>. All future screenshots will utilize this website for visual reference.<\/p>\n<\/blockquote>\n\n\n\n<p>In the &#8220;Test&#8221; example the letters test formed the search pattern, same as a simple search.<br>These regexes are not always so simple, however. Here&#8217;s a regex that matches 3 numbers, followed by a &#8220;-&#8220;, followed by 3 numbers, followed by another &#8220;-&#8220;, finally ended by 4 numbers.<\/p>\n\n\n\n<p>You know, like a phone number:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">^(?:\\d{3}-){2}\\d{4}$<\/code><\/span><\/pre>\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/d2h1bfu6zrdxog.cloudfront.net\/wp-content\/uploads\/2022\/04\/img_625491e9ce092.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">The phone number &#8220;555-555-5555&#8221; will match with the regex above, but &#8220;555-abc-5555&#8221; will not<\/figcaption><\/figure>\n\n\n\n<p>This regex may look complicated, but two things to keep in mind:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>We&#8217;ll teach you how to read and write these in this article<\/li>\n\n\n\n<li>This is a fairly complex way of writing this regex.<\/li>\n<\/ol>\n\n\n\n<p>In fact, most regexes can be written in multiple ways, just like other forms of programming. For example, the above can be rewritten into a longer but slightly more readable version:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">^&#91;0-9]{3}-&#91;0-9]{3}-&#91;0-9]{4}$<\/code><\/span><\/pre>\n\n\n<blockquote class=\"wp-block-quote\">\n<p>Most languages provide a built-in method for searching and replacing strings using regex. However, each language may have a different set of syntaxes based on what the language dictates.<\/p>\n\n\n\n<p>In this article, we&#8217;ll focus on the ECMAScript variant of Regex, which is used in JavaScript and shares a lot of commonalities with other languages&#8217; implementations of regex as well.<\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"how-to-read-and-write-regexes\">How to read (and write) regexes<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"quantifiers\">Quantifiers<\/h3>\n\n\n\n<p>Regex quantifiers check to see how many times you should search for a character.<\/p>\n\n\n\n<p>Here is a list of all quantifiers:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>a|b<\/code>&#8211; Match either &#8220;a&#8221; or &#8220;b<\/li>\n\n\n\n<li><code>?<\/code> &#8211; Zero or one<\/li>\n\n\n\n<li><code>+<\/code> &#8211; one or more<\/li>\n\n\n\n<li><code>*<\/code> &#8211; zero or more<\/li>\n\n\n\n<li><code>{N}<\/code> &#8211; Exactly N number of times (where N is a number)<\/li>\n\n\n\n<li>&nbsp;<code>{N,}<\/code> &#8211; N or more number of times (where N is a number)<\/li>\n\n\n\n<li><code>{N,M}<\/code> &#8211; Between N and M number of times (where N and M are numbers and N &lt; M)<\/li>\n\n\n\n<li><code>*?<\/code> &#8211; Zero or more, but stop after first match<\/li>\n<\/ul>\n\n\n\n<p>For example, the following regex:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">Hello|Goodbye<\/code><\/span><\/pre>\n\n\n<p>Matches both the string &#8220;Hello&#8221; and &#8220;Goodbye&#8221;.<\/p>\n\n\n\n<p>Meanwhile:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">Hey?<\/code><\/span><\/pre>\n\n\n<p>Will track &#8220;y&#8221; zero to one time, so will match up with &#8220;He&#8221; and &#8220;Hey&#8221;.<\/p>\n\n\n\n<p>Alternatively:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">Hello{1,3}<\/code><\/span><\/pre>\n\n\n<p>Will match &#8220;Hello&#8221;, &#8220;Helloo&#8221;, &#8220;Hellooo&#8221;, but not &#8220;Helloooo&#8221;, as it is looking for the letter &#8220;o&#8221; between 1 and 3 times.<\/p>\n\n\n\n<p>These can even be combined with one another:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">He?llo{2}<\/code><\/span><\/pre>\n\n\n<p>Here we&#8217;re looking for strings with zero-to-one instances of &#8220;e&#8221; and the letter &#8220;o&#8221; times 2, so this will match &#8220;Helloo&#8221; and &#8220;Hlloo&#8221;.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Greedy matching<\/h4>\n\n\n\n<p>One of the regex quantifiers we touched on in the previous list was the <code>+<\/code> symbol. This symbol matches one or more characters. This means that:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">Hi+<\/code><\/span><\/pre>\n\n\n<p>Will match everything from &#8220;Hi&#8221; to &#8220;Hiiiiiiiiiiiiiiii&#8221;. This is because all quantifiers are considered &#8220;greedy&#8221; by default.<\/p>\n\n\n\n<p>However, if you change it to be &#8220;lazy&#8221; using a question mark symbol (<code>?<\/code>) to the following, the behavior changes.<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">Hi+?<\/code><\/span><\/pre>\n\n\n<p>Now, the <code>i<\/code> matcher will try to match as few times as possible. Since the <code>+<\/code>icon means &#8220;one or more&#8221;, it will only match one &#8220;i&#8221;. This means that if we input the string &#8220;Hiiiiiiiiiii&#8221;, only &#8220;Hi&#8221; will be matched.<\/p>\n\n\n\n<p>While this isn&#8217;t particularly useful on its own, when combined with broader matches like the the <code>.<\/code> symbol, it becomes extremely important as we&#8217;ll cover in the next section. The <code>.<\/code>symbol is used in regex to find &#8220;any character&#8221;.<\/p>\n\n\n\n<p>Now if you use:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">H.*llo<\/code><\/span><\/pre>\n\n\n<p>You can match everything from &#8220;Hillo&#8221; to &#8220;Hello&#8221; to &#8220;Hellollollo&#8221;.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/d2h1bfu6zrdxog.cloudfront.net\/wp-content\/uploads\/2022\/04\/img_625491eaa8f12.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">&gt;We&#8217;re using a regex \/H.*llo\/ to look for the words &#8220;Hillo&#8221;, &#8220;Hello&#8221;, and &#8220;Helloollo&#8221;<\/figcaption><\/figure>\n\n\n\n<p>However, what if you want to only match &#8220;Hello&#8221; from the final example?<\/p>\n\n\n\n<p>Well, simply make the search lazy with a <code>?<\/code>&nbsp; and it&#8217;ll work as we want:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">H.*?llo<\/code><\/span><\/pre>\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/d2h1bfu6zrdxog.cloudfront.net\/wp-content\/uploads\/2022\/04\/img_625491eb64cd2.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">We&#8217;re using a regex \/H.*?llo\/ to look for the words &#8220;Hillo&#8221;, &#8220;Hello&#8221;, and partially match the &#8220;Hello&#8221; in &#8220;Helloollo&#8221;<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"pattern-collections\">Pattern collections<\/h3>\n\n\n\n<p>Pattern collections allow you to search for a collection of characters to match against. For example, using the following regex:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-1\" data-shcb-language-name=\"CSS\" data-shcb-language-slug=\"css\"><span><code class=\"hljs language-css shcb-wrap-lines\"><span class=\"hljs-selector-tag\">My<\/span> <span class=\"hljs-selector-tag\">favorite<\/span> <span class=\"hljs-selector-tag\">vowel<\/span> <span class=\"hljs-selector-tag\">is<\/span> <span class=\"hljs-selector-attr\">&#91;aeiou]<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-1\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">CSS<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">css<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>You could match the following strings:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">My favorite vowel is a\nMy favorite vowel is e\nMy favorite vowel is i\nMy favorite vowel is o\nMy favorite vowel is u<\/code><\/span><\/pre>\n\n\n<p>But nothing else.<\/p>\n\n\n\n<p>Here&#8217;s a list of the most common pattern collections:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>[A-Z]<\/code>&#8211; Match any uppercase character from &#8220;A&#8221; to &#8220;Z&#8221;<\/li>\n\n\n\n<li><code>[a-z]<\/code>&#8211; Match any lowercase character from &#8220;a&#8221; to &#8220;z&#8221;<\/li>\n\n\n\n<li><code>[0-9]<\/code> &#8211; Match any number<\/li>\n\n\n\n<li><code>[asdf]<\/code>&#8211; Match any character that&#8217;s either &#8220;a&#8221;, &#8220;s&#8221;, &#8220;d&#8221;, or &#8220;f&#8221;<\/li>\n\n\n\n<li><code>[^asdf]<\/code>&#8211; Match any character that&#8217;s not any of the following: &#8220;a&#8221;, &#8220;s&#8221;, &#8220;d&#8221;, or &#8220;f&#8221;<\/li>\n<\/ul>\n\n\n\n<p>You can even combine these together:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>[0-9A-Z]<\/code>&#8211; Match any character that&#8217;s either a number or a capital letter from &#8220;A&#8221; to &#8220;Z&#8221;<\/li>\n\n\n\n<li><code>[^a-z]<\/code> &#8211; Match any non-lowercase letter<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"general-tokens\">General tokens<\/h3>\n\n\n\n<p>Not every character is so easily identifiable. While keys like &#8220;a&#8221; to &#8220;z&#8221; make sense to match using regex, what about the newline character?<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\">\n<p>The &#8220;newline&#8221; character is the character that you input whenever you press &#8220;Enter&#8221; to add a new line.<\/p>\n<\/blockquote>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>.<\/code> &#8211; Any character<\/li>\n\n\n\n<li><code>\\n<\/code> &#8211; Newline character<\/li>\n\n\n\n<li><code>\\t<\/code> &#8211; Tab character<\/li>\n\n\n\n<li><code>\\s<\/code>&#8211; Any whitespace character (including <code>\\t<\/code>, <code>\\n<\/code> and a few others)<\/li>\n\n\n\n<li><code>\\S<\/code> &#8211; Any non-whitespace character<\/li>\n\n\n\n<li><code>\\w<\/code>&#8211; Any word character (Uppercase and lowercase Latin alphabet, numbers 0-9, and <code>_<\/code>)<\/li>\n\n\n\n<li><code>\\W<\/code>&#8211; Any non-word character (the inverse of the <code>\\w<\/code> token)<\/li>\n\n\n\n<li><code>\\b<\/code>&#8211; Word boundary: The boundaries between <code>\\w<\/code> and <code>\\W<\/code>, but matches in-between characters<\/li>\n\n\n\n<li><code>\\B<\/code>&#8211; Non-word boundary: The inverse of <code>\\b<\/code><\/li>\n\n\n\n<li><code>^<\/code> &#8211; The start of a line<\/li>\n\n\n\n<li><code>$<\/code> &#8211; The end of a line&nbsp;<\/li>\n\n\n\n<li><code>\\<\/code>&#8211; The literal character &#8220;\\&#8221;<\/li>\n<\/ul>\n\n\n\n<p>So if you wanted to remove every character that starts a new word you could use something like the following regex:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">\\s.<\/code><\/span><\/pre>\n\n\n<p>And replace the results with an empty string. Doing this, the following:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">Hello world how are you<\/code><\/span><\/pre>\n\n\n<p>Becomes:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">Helloorldowreou<\/code><\/span><\/pre>\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/d2h1bfu6zrdxog.cloudfront.net\/wp-content\/uploads\/2022\/04\/img_625491ec1bae5.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">We&#8217;re using a regex \/\\s.\/ to look for the whitespaces alongside the following character in the string &#8220;Hello world how are you&#8221;<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Combining with collections<\/h3>\n\n\n\n<p>These tokens aren&#8217;t just useful on their own, though! Let&#8217;s say that we want to remove any uppercase letter or whitespace character. Sure, we could write<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">&#91;A-Z]|\\s<\/code><\/span><\/pre>\n\n\n<p>But we can actually merge these together and place our <code>\\s<\/code> token into the collection:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-2\" data-shcb-language-name=\"JSON \/ JSON with Comments\" data-shcb-language-slug=\"json\"><span><code class=\"hljs language-json shcb-wrap-lines\">&#91;A-Z\\s]<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-2\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JSON \/ JSON with Comments<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">json<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/d2h1bfu6zrdxog.cloudfront.net\/wp-content\/uploads\/2022\/04\/img_625491ecd9758.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">We&#8217;re using a regex \/[A-Z\\s]\/ to look for uppercase letters and whitespaces in the string &#8220;Hello World how are you&#8221;<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Word boundaries<\/h3>\n\n\n\n<p>In our list of tokens, we mentioned <code>\\b<\/code> to match word boundaries. I thought I&#8217;d take a second to explain how it acts a bit differently from others.<br><br>Given a string like &#8220;This is a string&#8221;, you might expect the whitespace characters to be matched \u2013 however, this isn&#8217;t the case. Instead, it matches between the letters and the whitespace:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/d2h1bfu6zrdxog.cloudfront.net\/wp-content\/uploads\/2022\/04\/img_625491ed5c439.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">We&#8217;re using a word boundary regex \/\\b\/ to look for the in-between spaces in characters<\/figcaption><\/figure>\n\n\n\n<p>This can be tricky to get your head around, but it&#8217;s unusual to simply match against a word boundary. Instead, you might have something like the following to match full words:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">\\b\\w+\\b<\/code><\/span><\/pre>\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/d2h1bfu6zrdxog.cloudfront.net\/wp-content\/uploads\/2022\/04\/img_625491edd0c2c.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">We&#8217;re using a regex \/\\b\\w+\\b\/ to look for full words. In the string &#8220;This is a string&#8221; we match &#8220;this&#8221;, &#8220;is&#8221;, &#8220;a&#8221;, and &#8220;string&#8221;<\/figcaption><\/figure>\n\n\n\n<p>You can interpret that regex statement like this:<\/p>\n\n\n\n<p>&#8220;A word boundary. Then, one or more &#8216;word&#8217; characters. Finally, another word boundary&#8221;.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Start and end line<\/h3>\n\n\n\n<p>Two more tokens that we touched on are <code>^<\/code> and <code>$<\/code>. These mark off the start of a line and end of a line, respectively.<\/p>\n\n\n\n<p>So, if you want to find the first word, you might do something like this:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">^\\w+<\/code><\/span><\/pre>\n\n\n<p>To match one or more &#8220;word&#8221; characters, but only immediately after the line starts. Remember, a &#8220;word&#8221; character is any character that&#8217;s an uppercase or lowercase Latin alphabet letters, numbers 0-9, and<code>_<\/code>.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/d2h1bfu6zrdxog.cloudfront.net\/wp-content\/uploads\/2022\/04\/img_625491ee79f3a.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">The regex \/^\\w+\/ matches the first word in the string. In &#8220;This is a string&#8221; we match &#8220;This&#8221;<\/figcaption><\/figure>\n\n\n\n<p>Likewise, if you want to find the last word your regex might look something like this:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">\\w+$<\/code><\/span><\/pre>\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/d2h1bfu6zrdxog.cloudfront.net\/wp-content\/uploads\/2022\/04\/img_625491ef21dfb.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">You can use \/\\w+$\/ to match the last word in the string. In &#8220;This is a string&#8221; we match &#8220;string&#8221;<\/figcaption><\/figure>\n\n\n\n<p>However, just because these tokens <strong>typically<\/strong> end a line doesn&#8217;t mean that they can&#8217;t have characters after them.<\/p>\n\n\n\n<p>For example, what if we wanted to find every whitespace character between newlines to act as a basic <a href=\"https:\/\/en.wikipedia.org\/wiki\/Minification_(programming)\" target=\"_blank\" rel=\"noopener\">JavaScript minifier<\/a>?&nbsp;<\/p>\n\n\n\n<p>Well, we can say &#8220;Find all whitespace characters after the end of a line&#8221; using the following regex:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">$\\s+<\/code><\/span><\/pre>\n\n\n<figure class=\"wp-block-image is-resized\"><img decoding=\"async\" src=\"https:\/\/d2h1bfu6zrdxog.cloudfront.net\/wp-content\/uploads\/2022\/04\/img_625491ef9f25f.png\" alt=\"\" style=\"width:472px;height:548px\"\/><figcaption class=\"wp-element-caption\">We can use \/$\\s+\/ to find all whitespace between the end of a string and the start of the next string.<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Character escaping<\/h3>\n\n\n\n<p>While tokens are super helpful, they can introduce some complexity when trying to match strings that actually contain tokens. For example, say you have the following string in a blog post:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-3\" data-shcb-language-name=\"JSON \/ JSON with Comments\" data-shcb-language-slug=\"json\"><span><code class=\"hljs language-json shcb-wrap-lines\"><span class=\"hljs-string\">\"The newline character is '\\n'\"<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-3\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JSON \/ JSON with Comments<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">json<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Or want to find every instance of this blog post&#8217;s usage of the &#8220;\\n&#8221; string. Well, you can escape characters using<code>\\<\/code>. This means that your regex might look something like this:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">\\\\n<\/code><\/span><\/pre>\n\n\n<h2 class=\"wp-block-heading\" id=\"how-to-use-a-regex\">How to use a regex<\/h2>\n\n\n\n<p>Regular expressions aren&#8217;t simply useful for <em>finding<\/em> strings, however. You&#8217;re also able to use them in other methods to help modify or otherwise work with strings.<\/p>\n\n\n\n<p>While many languages have similar methods, let&#8217;s use JavaScript as an example.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Creating and searching using regex<\/h3>\n\n\n\n<p>First, let&#8217;s look at how regex strings are constructed.&nbsp;<\/p>\n\n\n\n<p>In JavaScript (along with many other languages), we place our regex inside of <code>\/\/<\/code> blocks. The regex searching for a lowercase letter looks like this:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">\/&#91;a-z]\/<\/code><\/span><\/pre>\n\n\n<p>This syntax then generates <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/JavaScript\/Reference\/Global_Objects\/RegExp\" target=\"_blank\" rel=\"noopener\">a RegExp object<\/a> which we can use with <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/JavaScript\/Reference\/Global_Objects\/RegExp\/exec\" target=\"_blank\" rel=\"noopener\">built-in methods, like <code>exec<\/code><\/a>, to match against strings.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-4\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript shcb-wrap-lines\">\/&#91;a-z]\/.exec(<span class=\"hljs-string\">\"a\"<\/span>); <span class=\"hljs-comment\">\/\/ Returns &#91;\"a\"]<\/span>\n<span class=\"hljs-regexp\">\/&#91;a-z]\/<\/span>.exec(<span class=\"hljs-string\">\"0\"<\/span>); <span class=\"hljs-comment\">\/\/ Returns null<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-4\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>We can then use this <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Glossary\/Truthy\" target=\"_blank\" rel=\"noopener\">truthiness<\/a> to determine if a regex matched, like we&#8217;re doing in line #3 of this example:<\/p>\n\n\n<div\n\tclass=\"sandbox-embed responsive-embed  sandbox-embed--full-width\"\n\tstyle=\"padding-top: 125%\"\ndata-block-name=\"coderpad-sandbox-embed\">\n\t<iframe src=\"https:\/\/embed.coderpad.io\/sandbox?question_id=211635&#038;use_question_button\" width=\"640\" height=\"800\" loading=\"lazy\" aria-label=\"Try out the CoderPad sandbox\"><\/iframe>\n<\/div>\n\n\n\n<p>We can also alternatively call a <code>RegExp<\/code> constructor with the string we want to convert into a regex:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-5\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript shcb-wrap-lines\"><span class=\"hljs-keyword\">const<\/span> regex = <span class=\"hljs-keyword\">new<\/span> <span class=\"hljs-built_in\">RegExp<\/span>(<span class=\"hljs-string\">\"&#91;a-z]\"<\/span>); <span class=\"hljs-comment\">\/\/ Same as \/&#91;a-z]\/<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-5\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Replacing strings with regex<\/h3>\n\n\n\n<p>You can also use a regex to search and replace a file&#8217;s contents as well. Say you wanted to replace any greeting with a message of &#8220;goodbye&#8221;. While you could do something like this:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-6\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript shcb-wrap-lines\"><span class=\"hljs-function\"><span class=\"hljs-keyword\">function<\/span> <span class=\"hljs-title\">youSayHelloISayGoodbye<\/span>(<span class=\"hljs-params\">str<\/span>) <\/span>{\n\u00a0 str = str.replace(<span class=\"hljs-string\">\"Hello\"<\/span>, <span class=\"hljs-string\">\"Goodbye\"<\/span>);\n\u00a0 str = str.replace(<span class=\"hljs-string\">\"Hi\"<\/span>, <span class=\"hljs-string\">\"Goodbye\"<\/span>);\n\u00a0 str = str.replace(<span class=\"hljs-string\">\"Hey\"<\/span>, <span class=\"hljs-string\">\"Goodbye\"<\/span>);\u00a0\u00a0str = str.replace(<span class=\"hljs-string\">\"hello\"<\/span>, <span class=\"hljs-string\">\"Goodbye\"<\/span>);\n\u00a0 str = str.replace(<span class=\"hljs-string\">\"hi\"<\/span>, <span class=\"hljs-string\">\"Goodbye\"<\/span>);\n\u00a0 str = str.replace(<span class=\"hljs-string\">\"hey\"<\/span>, <span class=\"hljs-string\">\"Goodbye\"<\/span>);\n\u00a0 <span class=\"hljs-keyword\">return<\/span> str;\n}<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-6\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>There&#8217;s an easier alternative, using a regex:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-7\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript shcb-wrap-lines\"><span class=\"hljs-function\"><span class=\"hljs-keyword\">function<\/span> <span class=\"hljs-title\">youSayHelloISayGoodbye<\/span>(<span class=\"hljs-params\">str<\/span>) <\/span>{\n\u00a0 str = str.replace(<span class=\"hljs-regexp\">\/&#91;Hh]ello|&#91;Hh]i|&#91;Hh]ey\/<\/span>, <span class=\"hljs-string\">\"Goodbye\"<\/span>);\n\u00a0 <span class=\"hljs-keyword\">return<\/span> str;\n}<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-7\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n<div\n\tclass=\"sandbox-embed responsive-embed \"\n\tstyle=\"padding-top: 125%\"\ndata-block-name=\"coderpad-sandbox-embed\">\n\t<iframe src=\"https:\/\/embed.coderpad.io\/sandbox?question_id=211638&#038;use_question_button\" width=\"640\" height=\"800\" loading=\"lazy\" aria-label=\"Try out the CoderPad sandbox\"><\/iframe>\n<\/div>\n\n\n\n<p>However, something you might notice is that if you run <code>youSayHelloISayGoodbye<\/code>with &#8220;Hello, Hi there&#8221;: it won&#8217;t match more than a single input:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/d2h1bfu6zrdxog.cloudfront.net\/wp-content\/uploads\/2022\/04\/img_625491f048067.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">If the regex \/[Hh]ello|[Hh]i|[Hh]ey\/ is used on the string &#8220;Hello, Hi there&#8221;, it will only match &#8220;Hello&#8221; by default.<\/figcaption><\/figure>\n\n\n\n<p>Here, we should expect to see both &#8220;Hello&#8221; and &#8220;Hi&#8221; matched, but we don&#8217;t.<\/p>\n\n\n\n<p>This is because we need to utilize a Regex &#8220;flag&#8221; to match more than once.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"flags\">Flags<\/h2>\n\n\n\n<p>A regex flag is a modifier to an existing regex. These flags are always appended after the last forward slash in a regex definition.&nbsp;<\/p>\n\n\n\n<p>Here&#8217;s a shortlist of some of the flags available to you.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>g<\/code> &#8211; Global, match more than once<\/li>\n\n\n\n<li><code>m<\/code> &#8211; Force $ and ^ to match each newline individually<\/li>\n\n\n\n<li><code>i<\/code> &#8211; Make the regex case insensitive<\/li>\n<\/ul>\n\n\n\n<p>This means that we could rewrite the following regex:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">\/&#91;Hh]ello|&#91;Hh]i|&#91;Hh]ey\/<\/code><\/span><\/pre>\n\n\n<p>To use the case insensitive flag instead:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">\/Hello|Hi|Hey\/i<\/code><\/span><\/pre>\n\n\n<p>With this flag, this regex will now match:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">Hello\nHEY\nHi\nHeLLo<\/code><\/span><\/pre>\n\n\n<p>Or any other case-modified variant.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Global regex flag with string replacing<\/h3>\n\n\n\n<p>As we mentioned before, if you do a regex replace without any flags it will only replace the first result:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-8\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript shcb-wrap-lines\"><span class=\"hljs-keyword\">let<\/span> str = <span class=\"hljs-string\">\"Hello, hi there!\"<\/span>;\nstr = str.replace(<span class=\"hljs-regexp\">\/&#91;Hh]ello|&#91;Hh]i|&#91;Hh]ey\/<\/span>, <span class=\"hljs-string\">\"Goodbye\"<\/span>);\n<span class=\"hljs-built_in\">console<\/span>.log(str); <span class=\"hljs-comment\">\/\/ Will output \"Goodbye, hi there\"<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-8\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>However, if you pass the <code>global<\/code> flag, you&#8217;ll match every instance of the greetings matched by the regex:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-9\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript shcb-wrap-lines\"><span class=\"hljs-keyword\">let<\/span> str = <span class=\"hljs-string\">\"Hello, hi there!\"<\/span>;\nstr = str.replace(<span class=\"hljs-regexp\">\/&#91;Hh]ello|&#91;Hh]i|&#91;Hh]ey\/g<\/span>, <span class=\"hljs-string\">\"Goodbye\"<\/span>);\n<span class=\"hljs-built_in\">console<\/span>.log(str); <span class=\"hljs-comment\">\/\/ Will output \"Goodbye, Goodbye there!\"<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-9\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">A note about JavaScript&#8217;s global flag<\/h3>\n\n\n\n<p>When using a global JavaScript regex, you might run into some strange behavior when running the <code>exec<\/code> command more than once.<\/p>\n\n\n\n<p>In particular, if you run <code>exec<\/code> with a global regex, it will return <code>null<\/code> every other time:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/d2h1bfu6zrdxog.cloudfront.net\/wp-content\/uploads\/2022\/04\/img_625491f141259.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">If we assign a regex to a variable then run `exec` on said variable, it will find the results properly the first and third time, but return `null` the second time<\/figcaption><\/figure>\n\n\n\n<p>This is because, as <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/JavaScript\/Reference\/Global_Objects\/RegExp\/exec\" target=\"_blank\" rel=\"noopener\">MDN explains<\/a>:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\">\n<p>JavaScript<a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/JavaScript\/Reference\/Global_Objects\/RegExp\" target=\"_blank\" rel=\"noopener\"> RegExp<\/a> objects are <strong>stateful<\/strong> when they have the<a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/JavaScript\/Reference\/Global_Objects\/RegExp\/global\" target=\"_blank\" rel=\"noopener\"> global<\/a> or<a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/JavaScript\/Reference\/Global_Objects\/RegExp\/sticky\" target=\"_blank\" rel=\"noopener\"> sticky<\/a> flags set\u2026 They store a<a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/JavaScript\/Reference\/Global_Objects\/RegExp\/lastIndex\" target=\"_blank\" rel=\"noopener\"> lastIndex<\/a> from the previous match. Using this internally, exec() can be used to iterate over multiple matches in a string of text\u2026<\/p>\n<\/blockquote>\n\n\n\n<p>The <code>exec<\/code> command attempts to start looking through the <code>lastIndex<\/code> moving forward. Because <code>lastIndex<\/code> is set to the length of the string, it will attempt to match <code>\"\"<\/code> \u2013 an empty string \u2013 against your regex until it is reset by another <code>exec<\/code> command again. While this feature can be useful in specific niche circumstances, it&#8217;s often confusing for new users.<\/p>\n\n\n\n<p>To solve this problem, we can simply assign <code>lastIndex<\/code> to 0 before running each <code>exec<\/code> command:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/d2h1bfu6zrdxog.cloudfront.net\/wp-content\/uploads\/2022\/04\/img_625491f274abf.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">If we run `regex.lastIndex = 0` in between each `regex.exec`, then every single `exec` runs as intended<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"groups\">Groups<\/h2>\n\n\n\n<p>When searching with a regex, it can be helpful to search for more than one matched item at a time. This is where &#8220;groups&#8221; come into play. Groups allow you to search for more than a single item at a time.<\/p>\n\n\n\n<p>Here, we can see matching against both <code>Testing 123<\/code>&nbsp; and <code>Tests 123<\/code>without duplicating the &#8220;123&#8221; matcher in the regex.<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">\/(Testing|tests) 123\/ig<\/code><\/span><\/pre>\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/d2h1bfu6zrdxog.cloudfront.net\/wp-content\/uploads\/2022\/04\/img_625491f390248.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">With the regex \/(Testing|tests) 123\/ig we can match &#8220;Testing 123&#8221; and &#8220;Tests 123&#8221;<\/figcaption><\/figure>\n\n\n\n<p>Groups are defined by parentheses; there are two different types of groups&#8211;capture groups and non-capturing groups:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>(...)<\/code> &#8211; Group matching any three characters<\/li>\n\n\n\n<li><code>(?:...)<\/code> &#8211; Non-capturing group matching any three characters<\/li>\n<\/ul>\n\n\n\n<p>The difference between these two typically comes up in the conversation when &#8220;replace&#8221; is part of the equation.&nbsp;<\/p>\n\n\n\n<p>For example, using the regex above, we can use the following JavaScript to replace the text with &#8220;Testing 234&#8221; and &#8220;tests 234&#8221;:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-10\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript shcb-wrap-lines\"><span class=\"hljs-keyword\">const<\/span> regex = <span class=\"hljs-regexp\">\/(Testing|tests) 123\/ig<\/span>;\n\n<span class=\"hljs-keyword\">let<\/span> str = <span class=\"hljs-string\">`\nTesting 123\nTests 123\n`<\/span>;\n\nstr = str.replace(regex, <span class=\"hljs-string\">'$1 234'<\/span>);\n<span class=\"hljs-built_in\">console<\/span>.log(str); <span class=\"hljs-comment\">\/\/ Testing 234\\nTests 234\"<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-10\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>We&#8217;re using <code>$1<\/code> to refer to the first capture group, <code>(Testing|tests)<\/code>. We can also match more than a single group, like both <code>(Testing|tests)<\/code> and <code>(123)<\/code>:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-11\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript shcb-wrap-lines\"><span class=\"hljs-keyword\">const<\/span> regex = <span class=\"hljs-regexp\">\/(Testing|tests) (123)\/ig<\/span>;\n\n<span class=\"hljs-keyword\">let<\/span> str = <span class=\"hljs-string\">`\nTesting 123\nTests 123\n`<\/span>;\n\nstr = str.replace(regex, <span class=\"hljs-string\">'$1 #$2'<\/span>);\n<span class=\"hljs-built_in\">console<\/span>.log(str); <span class=\"hljs-comment\">\/\/ Testing #123\\nTests #123\"<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-11\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>However, this is only true for capture groups. If we change:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">\/(Testing|tests) (123)\/ig<\/code><\/span><\/pre>\n\n\n<p>To become:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">\/(?:Testing|tests) (123)\/ig;<\/code><\/span><\/pre>\n\n\n<p>Then there is only one captured group \u2013 <code>(123)<\/code> \u2013 and instead, the same code from above will output something different:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-12\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript shcb-wrap-lines\"><span class=\"hljs-keyword\">const<\/span> regex = <span class=\"hljs-regexp\">\/(?:Testing|tests) (123)\/ig<\/span>;\n\n<span class=\"hljs-keyword\">let<\/span> str = <span class=\"hljs-string\">`\nTesting 123\nTests 123\n`<\/span>;\n\nstr = str.replace(regex, <span class=\"hljs-string\">'$1'<\/span>);\n<span class=\"hljs-built_in\">console<\/span>.log(str); <span class=\"hljs-comment\">\/\/ \"123\\n123\"<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-12\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n<div\n\tclass=\"sandbox-embed responsive-embed  sandbox-embed--full-width\"\n\tstyle=\"padding-top: 125%\"\ndata-block-name=\"coderpad-sandbox-embed\">\n\t<iframe src=\"https:\/\/embed.coderpad.io\/sandbox?question_id=211705&#038;use_question_button\" width=\"640\" height=\"800\" loading=\"lazy\" aria-label=\"Try out the CoderPad sandbox\"><\/iframe>\n<\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Named capture groups<\/h3>\n\n\n\n<p>While capture groups are awesome, it can easily get confusing when there are more than a few capture groups. The difference between <code>$3<\/code> and <code>$5<\/code> isn&#8217;t always obvious at a glance.<\/p>\n\n\n\n<p>To help solve for this problem, regexes have a concept called &#8220;named capture groups&#8221;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>(?&lt;name&gt;...)<\/code>&#8211; Named capture group called &#8220;name&#8221; matching any three characters<\/li>\n<\/ul>\n\n\n\n<p>You can use them in a regex like so to create a group called &#8220;num&#8221; that matches three numbers:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-13\" data-shcb-language-name=\"HTML, XML\" data-shcb-language-slug=\"xml\"><span><code class=\"hljs language-xml shcb-wrap-lines\">\/Testing (?<span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">num<\/span>&gt;<\/span>\\d{3})\/<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-13\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">HTML, XML<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">xml<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Then, you can use it in a replacement like so:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-14\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript shcb-wrap-lines\"><span class=\"hljs-keyword\">const<\/span> regex = <span class=\"hljs-regexp\">\/Testing (?&lt;num&gt;\\d{3})\/<\/span>\n<span class=\"hljs-keyword\">let<\/span> str = <span class=\"hljs-string\">\"Testing 123\"<\/span>;\nstr = str.replace(regex, <span class=\"hljs-string\">\"Hello $&lt;num&gt;\"<\/span>)\n<span class=\"hljs-built_in\">console<\/span>.log(str); <span class=\"hljs-comment\">\/\/ \"Hello 123\"<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-14\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Named back reference<\/h3>\n\n\n\n<p>Sometimes it can be useful to reference a named capture group inside of a query itself. This is where &#8220;back references&#8221; can come into play.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>\\k&lt;name&gt;<\/code>Reference named capture group &#8220;name&#8221; in a search query<\/li>\n<\/ul>\n\n\n\n<p>Say you want to match:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">Hello there James. James, how are you doing?<\/code><\/span><\/pre>\n\n\n<p>But not:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">Hello there James. Frank, how are you doing?<\/code><\/span><\/pre>\n\n\n<p>While you could write a regex that repeats the word &#8220;James&#8221; like the following:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">\/.*James. James,.*\/<\/code><\/span><\/pre>\n\n\n<p>A better alternative might look something like this:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-15\" data-shcb-language-name=\"HTML, XML\" data-shcb-language-slug=\"xml\"><span><code class=\"hljs language-xml shcb-wrap-lines\">\/.*(?<span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">name<\/span>&gt;<\/span>James). \\k<span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">name<\/span>&gt;<\/span>,.*\/<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-15\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">HTML, XML<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">xml<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Now, instead of having two names hardcoded, you only have one.<\/p>\n\n\n<div\n\tclass=\"sandbox-embed responsive-embed  sandbox-embed--full-width\"\n\tstyle=\"padding-top: 125%\"\ndata-block-name=\"coderpad-sandbox-embed\">\n\t<iframe src=\"https:\/\/embed.coderpad.io\/sandbox?question_id=211711&#038;use_question_button\" width=\"640\" height=\"800\" loading=\"lazy\" aria-label=\"Try out the CoderPad sandbox\"><\/iframe>\n<\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Lookahead and lookbehind groups<\/h3>\n\n\n\n<p>Lookahead and behind groups are extremely powerful and often misunderstood.<\/p>\n\n\n\n<p>There are four different types of lookahead and behinds:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>(?!)<\/code> &#8211; negative lookahead<\/li>\n\n\n\n<li><code>(?=)<\/code> &#8211; positive lookahead<\/li>\n\n\n\n<li><code>(?&lt;=)<\/code> &#8211; positive lookbehind<\/li>\n\n\n\n<li><code>(?&lt;!)<\/code> &#8211; negative lookbehind<\/li>\n<\/ul>\n\n\n\n<p>Lookahead works like it sounds like: It either looks to see that something <em>is<\/em> after the lookahead group or <em>is not<\/em> after the lookahead group, depending on if it&#8217;s positive or negative.<\/p>\n\n\n\n<p>As such, using the negative lookahead like so:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">\/B(?!A)\/<\/code><\/span><\/pre>\n\n\n<p>Will allow you to match <code>BC<\/code> but not <code>BA<\/code>.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/d2h1bfu6zrdxog.cloudfront.net\/wp-content\/uploads\/2022\/04\/img_625491f478fa1.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">With the regex \/B(?!A)\/ we can match &#8220;B&#8221; in &#8220;BC&#8221; but not in &#8220;BA&#8221;<\/figcaption><\/figure>\n\n\n\n<p>You can even combine these with <code>^<\/code> and <code>$<\/code> tokens to try to match full strings. For example, the following regex will match any string that <strong>does not <\/strong>start with &#8220;Test&#8221;<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">\/^(?!Test).*$\/gm<\/code><\/span><\/pre>\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/d2h1bfu6zrdxog.cloudfront.net\/wp-content\/uploads\/2022\/04\/img_625491f5599e9.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">\/^(?!Test).*$\/gm lets us match &#8220;Hello&#8221; and &#8220;Other&#8221;, but not &#8220;Testing 123&#8221; and &#8220;Tests 123&#8221;<\/figcaption><\/figure>\n\n\n\n<p>Likewise, we can switch this to a positive lookahead to enforce that our string <strong>must<\/strong>start with &#8220;Test&#8221;<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">\/^(?=Test).*$\/gm<\/code><\/span><\/pre>\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/d2h1bfu6zrdxog.cloudfront.net\/wp-content\/uploads\/2022\/04\/img_625491f7125a2.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Inversing our previous item &#8211; \/^(?=Test).*$\/gm lets us match &#8220;Testing 123&#8221; and &#8220;Tests 123&#8221;, but not &#8220;Hello&#8221; and &#8220;Other&#8221;<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Putting it all together<\/h2>\n\n\n\n<p>Regexes are extremely powerful and can be used in a myriad of string manipulations. Knowing them can help you refactor codebases, script quick language changes, and more!<\/p>\n\n\n\n<p>Let&#8217;s go back to our initial phone number regex and try to understand it again:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">^(?:\\d{3}-){2}\\d{4}$<\/code><\/span><\/pre>\n\n\n<p>Remember that this regex is looking to match phone numbers such as:<\/p>\n\n\n<pre class=\"wp-block-code\"><span><code class=\"hljs shcb-wrap-lines\">555-555-5555<\/code><\/span><\/pre>\n\n\n<p>Here this regex is:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Using <code>^<\/code> and <code>$<\/code> to define the start and end of a regex line.<\/li>\n\n\n\n<li>Using a non-capturing group to find three digits then a dash\n<ul class=\"wp-block-list\">\n<li>Repeating this group twice, to match <code>555-555-<\/code><\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Finding the last 4 digits of the phone number<\/li>\n<\/ul>\n\n\n\n<p>Hopefully, this article has been a helpful introduction to regexes for you. If you&#8217;d like to see quick definitions of useful regexes, check out our cheat sheet.<\/p>\n\n\n<aside class=\"\n    cta-banner\n        \"\ndata-block-name=\"cta-banner\">\n    <div class=\"inner\">\n        <div class=\"content\">\n                            <h2 class=\"headline\">Download Our Regex Cheat Sheet<\/h2>\n            \n                            <div class=\"cta-buttons\">\n                                    <a href=\"\/regular-expression-cheat-sheet\/\" class=\"button  js-cta--download\"  data-ga-category=\"CTA\" data-ga-label=\"Download Our Regex Cheat Sheet|Download\">Download<\/a>\n                                <\/div>\n                    <\/div>\n            <\/div>\n<\/aside>\n","protected":false},"excerpt":{"rendered":"<p>A Regular Expression \u2013 or regex for short\u2013 is a syntax that allows you to match strings with specific patterns. Think of it as a suped-up text search shortcut, but a regular expression adds the ability to use quantifiers, pattern collections, special characters, and capture groups to create extremely advanced search patterns.Regex can be used [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":5485,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[9],"tags":[],"persona":[29],"blog-programming-language":[33],"keyword-cluster":[],"class_list":["post-5439","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-development"],"acf":[],"_links":{"self":[{"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/posts\/5439","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/comments?post=5439"}],"version-history":[{"count":49,"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/posts\/5439\/revisions"}],"predecessor-version":[{"id":43368,"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/posts\/5439\/revisions\/43368"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/media\/5485"}],"wp:attachment":[{"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/media?parent=5439"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/categories?post=5439"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/tags?post=5439"},{"taxonomy":"persona","embeddable":true,"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/persona?post=5439"},{"taxonomy":"blog-programming-language","embeddable":true,"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/blog-programming-language?post=5439"},{"taxonomy":"keyword-cluster","embeddable":true,"href":"https:\/\/coderpad.io\/wp-json\/wp\/v2\/keyword-cluster?post=5439"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}