Created: 10/12/2022
string.replace
in JavaScriptFinding and replacing with capture groups is super useful. I do it all the time in vim using sed syntax. Here's an example!
Oh no, we're migrating to a new email provider, and they don't support the
+custom
thing (more one the email +
trick)
Input
[email protected]
[email protected]
Vim / Sed Regex; horribly cryptic, as usual:
%s/\(\w\+\)\(+\w\+\)\?@\(.*\)/\[email protected]
Output:
[email protected]
[email protected]
This is all very fine and good. Using capture groups allowed us to separately
capture the base email from the +custom
bit, then to lift that out of
the replace expression so that we can do our whole transformation in one go.
How can you embrace this magic in JavaScript, though; it seems like something that would be useful for string manipulation in our applications!
The gist of the API in javascript is that you can use $1
, $2
, etc. to refer
to to captured groups just like you can use \1
, \2
, etc. in vim or sed.
Although this makes sense by the end, I was really thrown for a loop by this behavior.
'input string'.replace(/(input)/, '$1') // 'input string'
JavaScript, what is happening???
I expected this usage to cause some change to the output, but the output was unaffected.
It turns out that with a partial match, finding and replacing only affects the
text for which it matched. So, we cannot touch the ' string'
substring at all
with the pattern shown above, since it wasn't captured by the regular
expression. In practice, that means we need to capture everything we want to
transform to do anything useful.
'input string'.replace(/(input) (string)/, '$1') // 'input'
We were able to remove ' string'
from the output because $1
now only
captures input
, but doesn't capture anything else.
We can use this pattern to do something sort of practical, actually.
names = ["John Smith", "Mary Jane", "Tim Peters"]
names.map((name) => name.replace(/(\w+) (\w+)/, '$2, $1'))
// => [ 'Smith, John', 'Jane, Mary', 'Peters, Tim' ]
The key here is that the whole string matches the whole regex. If the string doesn't match the regex at all, it doesn't matter what we put in the replace expression, the original input is passed through unaffected.
"can't touch this".replace(/no match/, '$1')
// => "can't touch this"
"can't touch this".replace(/no match/, '$2') // "can't touch this"
// => "can't touch this"
"can't touch this".replace(/no match/, 'hello??')
// => you guessed it; still "can't touch this"
The weird thing is that for a partial match, the bit that doesn't match gets passed through, while you can transform the part that does match. This highlights the weirdness. First, let's breakdown the regex I'm going to use:
/(\w+)@(\w+)\.(\w{3})/
In plain language, this is:
$1
)@
"$2
).
"$3
)As you might have started to recognize, this is an email regex. But, there's a problem. It supports top-level domains with 3 chracters (".com"), but not ones with two characters (".co"), which do exist! Therefore, we see the following:
// this is the same regex as before; don't bother reading it ๐
const reg = /(\w+)@(\w+)\.(\w{3})/
reg.test('[email protected]') // true
reg.test('[email protected]') // false
With that in mind, let's look at examples that combine everything we've learned!
const reg = /(\w+)@(\w+)\.(\w{3})/g // still the same!
// this is expected behavior; the regex matches exactly
'[email protected]'.replace(reg, '$1') // 'email' (as we'd expect)
'[email protected]'.replace(reg, '$2') // 'domain' (as we'd expect)
'[email protected]'.replace(reg, '$3') // 'com' (as we'd expect)
// when the regex doesn't match the input is passed through! Just like the
// "can't touch this" example from before
'[email protected]'.replace(reg, '$1') // '[email protected]'
'[email protected]'.replace(reg, '$2') // '[email protected]'
'[email protected]'.replace(reg, 'ahhhh') // '[email protected]'
Let's look at some partial matches now!
const reg = /(\w+)@(\w+)\.(\w{3})/g // still the same!
const falseInformation = 'My email domain is [email protected]'
// uh, no, the domain is just "domain.com"; let's fix that with our regex!
const fixed = falseInformation.replace(reg, '$2.$3')
fixed === 'My email domain is domain.com' // true
let challenge = `
For each of these emails, change the top-level domain from ".com" to ".gov"
- [email protected]
- [email protected]
- [email protected]
- [email protected]
`;
// There are other ways to accomplish this without repetition; out of scope for
// this tutorial
challenge = challenge.replace(reg, '$1@$2.gov')
challenge = challenge.replace(reg, '$1@$2.gov')
challenge = challenge.replace(reg, '$1@$2.gov')
challenge = challenge.replace(reg, '$1@$2.gov')
So, basically, if we have an exact match, the behavior is pretty unsurprising. If we don't have a match, the whole string just gets passed through. If we have a partial match, javascript performs the transformation on the bit that matched, but doesn't touch the rest.
// halloween substitution ๐ป
'hello world'.replace(/(hell)(o)/, '$1') // 'hell world'
And $2
captures the o
:
'hello world'.replace(/(hell)(o)/, '$2') // 'o'
Another important topic for understanding regex in JavaScript is regex modifier flags, which I've ignored to keep it simple here, but mdn documents them very well and I encourage you to experiment with them!