Regular Expressions in the Unix Shell - Extended Regular Expressions
(Page 2 of 4 )
As I pointed out, we are talking about pattern matching in texts. A pattern is a consecutive sequence of characters. This definition and approach allows us to consider a pattern as matching a text in a general form. For example, we can say in a regular expression that we want to find three "a"s followed by a number, or find three of any type of character followed by a digit.
We say we found a match when a pattern description fits a text section. I said "description" because you can use both proper characters and meta-characters to form the pattern. These together form the regular expression. Meta-characters are those that have a different meaning than their explicit one. For example, the ^ character, instead of representing the character, indicates the start of a text row.
The search engine skims through the text from left to right and tries to find the pattern. In the matching processes, it will differentiate between the characters' positions and the empty spaces between the characters. To rephrase, consider it to be adding a new empty character between all characters. This way we will have a character for the start of a row and one for the end, and so on.
For some obvious reason, we have no character defined for the start of the row, the end of it and so on. Because of this, we will redefine a couple of "characters" that we already have on our keyboard. In this way we turn them into meta-characters. For some of them we will use more than a character to accomplish this trick. By combining them, we can form general patterns. The table below will summarize them for us. On the following page, I will explain every one of them and give you a couple of examples.
Name
Meta-Character
What it means
Any character
.
Matches any type of character
Counting
*
Repeat the character any number of times (including zero)
?
Once or not at all
+
At least once, but more than once is also good
Ranges
{n}
N times
{n,}
At least n times, but more than n is also good.
{n,m}
At least n times, and at most m times.
Anchors
^
The start of the row
$
The end of the row
Grouping
(...)
Group matches
Alternating
|
Alternate the matching options
Character classes
[...]
Defines a group/class of characters
Backward pointing
n
n is an integer number.
n will point back to the n-th grouped match
More, special meaning of the character
b
Word boundary => The character between a word character and a non-word character
B
Not word boundary => The space between two word character
>
Empty character sequence at the end of the row
<
Empty character sequence at the start of the row
w
A word character is any of the a-z and A-Z and the digits and the _ characters.