Home > Categories > Misc > Regular Expression Crash Course
Home > Regular Expression Crash Course
Regular Expression Crash Course
Updated:
Published:
Regular expression or Regex is one of those things that are language agnostic.
You can learn regex in javascript but apply it almost every programming language without any change.
Well some programming language may try to implement little differently with easy to use methods. But the regex remains the same.
Regex is used in various applications. It is used in mobile, web, desktop apps, in front end and backend.
But I will tell you about few places that I personally use regex.
The main one is user input validation. Especially text input validation.
I mean you can validate on media file input also for extension name. For example you will only allow PNG or JPEG files and other formats are not allowed.
Its important to validate user input especially with cloud backed apps. You don't want to make multiple network requests that end up in failure.
That increases developer cost for the cloud and time cost for user for additional asynchronous network requests.
Another place I use regex is to filter given data. So there may be a search box involved but I will not call this searching through data.
Google search is the real search. What we do with regex is filtering of data not searching data.
I mean you can filter data with any programming language but it will get real messy real fast.
With regex you will have very succinct pattern with higher accuracy.
With regex you can create simple filters as well as really complex filters with ease.
The remaining are assorted use cases e.g. writing my own custom snippets for VSCode, validating cognito client id at backend etc etc.
You will find out many uses for regex as you build more apps.
To read rest of the article please Purchase Subscription
Why learn regex?
Regex flags
View regex cheat sheet on github
Regex preview extension for VSCode
Regex character
Regex character Set
Lazy and greedy pattern
Lookahead and lookbehind
Now you don't have to write regex from scratch. Regex is actually really hard.
There are lots of edge cases that you may not think about but other people have already solved that problem.
As I said earlier regex is used in every programming language. So you can easily find regex pattern by doing quick google search.
But this is where people make mistake. Don't copy paste regex pattern found on the internet.
You should learn regex enough to read and understand the pattern.
Then you should verify your understanding by doing rigorous testing.
And then you should modify regex to fit your specific use case.
So this is why its important to know basics of regex.
Before we get started I would recommend to develop inside development docker container.
So we no longer have the issue of "this works on my computer". And it will not mess with your existing workflow.
So checkout my blog post for detailed video explanation.
In any case make sure you have node installed on your system. Verify by running node -v
in the
terminal.
View regex cheat sheet on github
Have cheat sheet as handy reference.
Its also useful to refresh your memory when you forget all about regex over time.
Basics
Now create index.js
file to write code. And add following code
const regex = /something/
const input = 'The greatest glory in living lies not in never falling, but in rising every time we fall.'
console.log(input.match(regex))
console.log(regex.test(input) ? 'is valid': 'is not valid')
So regex string is always enclosed with forward slashes.
Here I just googled famous quotes and picked first I could find and using it as input string.
Feel free to replace it with your choice of string.
Generally there are 2 big use cases.
- Validate that the string conforms to certain rules
- Match the string for further processing
So I am printing both matches and if its valid or not.
So you can test your input string to your specific use case.
Now run node index.js
from terminal.
So first test is checking for something
string inside the input. But it doesn't exist so I will get
following output
null
is not valid
If you just change regex to /in/
then it will find it and print as follows
[ 'in', index: 19, input: 'The greatest glory in living lies not in never falling, but in rising every time we
fall.', groups: undefined ]
is valid
Global flag
It did find only single instance of string in
but in
has been repeated many times in the
input.
So if you want every single instance then add the global flag to the regex like /in/g
and you get
following output
[ 'in', 'in', 'in', 'in', 'in', 'in' ]
is valid
Now it picked out in
from living, falling, rising and so on.
Now you can see this been used for search and replace use case.
Case insensitive flag
Now if you change regex to find the string the
using regex /the/
then you get following
output
null
is not valid
The input starts with string The
but it didn't match.
Because t
is in upper case. You can make it case insensitive with flag i
.
So change regex to /the/i
then you will get following output
[ 'The', index: 0, input: 'The greatest glory in living lies not in never falling, but in rising every time we
fall.', groups: undefined ]
is valid
Or condition
If you are looking to add or condition then you have standard pipe (|
) character available in regex.
In most programming language same pipe character is used for or condition.
You can implement as /fall|falling/g
Here I am making a deliberate mistake to show you what not to do.
So we are looking for either fall
or falling
. Then you get following output
[ 'fall', 'fall' ]
is valid
We do have falling in the input string but it didn't get matched.
Only fall got matched. So it matches with first option and only on failure it goes to second option.
So there are better ways to implement same condition that we haven't learned yet.
If you really must use or condition then you can switch the order to get the correct result.
/falling|fall/g
[ 'falling', 'fall' ]
is valid
Wild card character
If you want to choose any character then you would use .
period character for it.
It can be any character except newline character (\n
).
So if you are taking multi line input string then keep this in mind.
/i.ing/g
So .
will replace any character as you can see in output.
[ 'iving', 'ising' ]
is valid
Now to test for newline exception I am going to update the input string to two lines as follows
The greatest glory in living lies not in never fall-
ing, but in rising every time we fall.
/fall./g
[ 'fall-', 'fall.' ]
is valid
With single wild card you can find 2 matches but if I add two consecutive wild cards I find nothing as follows
/fall../g
null
is not valid
At least one or more match
If you want to match condition or group to the left at least one time and at most infinity times then you use
+
plus character.
/l+/g
Here we are matching l
at least once and maximum of infinity times.
[ 'l', 'l', 'l', 'll', 'll' ]
is valid
Match zero to infinity times
In previous condition you had to at least match once to be valid.
This is same but with extra condition that you can match it zero times.
/al*/g
Here we are matching string that starts with a
and has l
at least zero times and maximum
of infinity times.
[ 'a', 'all', 'all' ]
is valid
The "greatest" word has a
in it and is followed by zero l
.
Match exactly the given length
This allows us to specify exact length for character or condition to the left to match with.
/l{2}/g
Here we are matching l
that appears exactly twice and is consecutive.
[ 'll', 'll' ]
is valid
Exact min length to infinite max length
This allows to set exact minimum length for condition or character to the left and no limit on maximum length.
For this I am going to update the input string as follows
const regex = /l{2,}/g
const input = 'The greatest glory in living lies not in never falling, but in rising
every time we fallllllll.'
[ 'll', 'llllllll' ]
is valid
Exact min and max length
This allows to set minimum length and maximum length of condition or character to the left of it.
/l{2,4}/
[ 'll', 'llll', 'llll' ]
is valid
Optional condition
Making variable optional is very common in all programming languages.
Same concept is applied here but for filter condition.
/ing?/
Here I am making the g
character optional
[ 'in', 'ing', 'in', 'ing', 'in', 'ing' ]
is valid
Escape syntax
So if you want to take the character literally instead of it as condition then you use backslash (\
).
/./
/\./
Without escape you couldn't select for exact period.
[ '.' ]
is valid
Start of the line
If you want to check for start of the line.
/^the/gi
[ 'The' ]
is valid
Multi line flag
If you wanted to check start of line on every line on multi line input then you could use multi line flag.
const regex = /^but/gm
const input = `The greatest glory in living lies not in never falling, but in rising
every time we fall.`
[ 'but' ]
is valid
If you remove the multiline flag then you won't find match.
/^but/g
null
is not valid
End of the line
If you want to match end of the line. Using previous multi line example.
/,$/gm
In case you don't get following output then there maybe extra space at the end of first line.
[ ',' ]
is valid
Without the multiline flag this would be invalid.
This is just short intro you can see the rest of it in the video at the top.
Free users cannot comment below so if you have questions then tweet me @apoorvmote. I would really like to hear your brutally honest feedback.
If you like this article please consider purchasing paid