Server Administration

  Home arrow Server Administration arrow Page 3 - Regular Expressions in the Unix Shell
SERVER ADMINISTRATION

Regular Expressions in the Unix Shell
By: Gabor Bernat
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 2
    2009-06-10

    Table of Contents:
  • Regular Expressions in the Unix Shell
  • Extended Regular Expressions
  • Explaining and Examples
  • The grep command

  •  
     

    SEARCH CODEWALKERS

    TOOLS YOU CAN USE

    advertisement

    Regular Expressions in the Unix Shell - Explaining and Examples


    (Page 3 of 4 )

    The table shown on the previous page was only for the meta-characters. Any other character will represent itself. For example, if you are searching for the text Yeti09, then the pattern will be Yeti09. You need no extra lesson for explicit patterns. The meta-character help you define matches when you are not sure exactly how what you are searching for looks. It offers a general description of the pattern.

    The any character is the "." This will represent any kind of character. For example, the ..will form a valid match both for the Yeand the TItexts. If you use it combined with the other characters you can describe the missing character of a word if you are not sure what it is. The "Y.ti" pattern will match the texts Yeti, Yxti, Y$tiand so on.

    The counting meta-characters are so you do not have to write so many repeating characters or character groups, as you will see later on. Say you are not sure if a character is present in the text. Assume that you are not sure if you wrote, for example in a file name, Ytior Yeti. If you are searching through the list of file names, the corresponding pattern to find both of these texts is "Ye?ti". The question will refer to the character preceding it, and will question its existence for the pattern.

    You are sure you wrote down the first e, however maybe you were bored and left your hand a little longer on the e character. Therefore, we may have at least one e, but more are also plausible. In this case, use the plus: "Ye+ti". This is good for:

    asdsadYetiss

    asdsadYeeeetiss

    The star is for any kind of number. Using this is risky, as it can include no appearances of that character also. The pattern "Ye*ti" is good for:

    sadsaYtisadsa

    kl;kYetil;'l;

    ouioYeeeetiljkhl

    Now we can mix this one with the any character to form matches that are even more general. To the task of finding any and all character sequences starting with the letter a and ending in the digit 9, the answer is "a.*9". Again, the counting meta-characters refer to the character (or group as you will see later) prior to them.

    The range meta-character construction is for more precise control of appearance. If you want to find a sequence of ten a letters use the "a{10}"pattern. If it is at least ten but could be more, there is no problem; use the "a{10,}". Filter the occurrences between ten and twelve with the "a{10,12}" pattern.

    Most of the time we will look in files containing multiple lines of text. In this case, we have the start of the row and the end of the row. This is true also for a single sequence of characters, where before the first is the start and after the last is the end. Now we can refer to the start with the ^ character and to the end with the $. To return to file searching, assume you are searching for file names containing exactly the name Yeti. With the use of the pattern "^Yeti$" the list of matches (the bold points a match) looks like this:

    aYeti

    Yetib

    Yeti

    Sometimes you are not sure what character you are missing; however you can restrict it to a group. In these cases, you can form a class of characters. You need to enclose this inside of brackets []. Inside the brackets, you can enumerate the characters you want to refer to, like this: [123Ab].

    It would be harsh to write down all the numbers or letters if you want to use only them, so to avoid this you can refer to the small letters with the a-z, the capital letters with the A-Z and the numbers with the 0-9 construction. This construction will match any of the characters inside it. Therefore, you can consider it as a single character and you can add after it the counting and range tools.

    We can also group a part of the pattern for two reasons. The first reason is if we want to point back to it later; the second is to alternate patterns. Pointing backward comes in handy when you search for something twice in a character sequence. For example, you search for the pattern Ye+tiwhere the exact same form comes up twice in the text.

    The pattern "Ye+ti.*Ye+ti" is not good, as maybe the first time the e is present once, while the second time it's present twice. In this case, if we put a part inside the parentheses, later we can refer back to it with the n construction. In this case, n stays for the number of the parentheses. The solution for the upper issue is the pattern "(Ye+ti).*1".

    Finding a palindrome is easy with this. Consider if it has five letters and has only letters: ([a-zA-Z])([a-zA-Z])[a-zA-Z]21. You can even negate a group by adding the ^ character at the beginning. If you add it anywhere else in the group, the ^ will represent the character itself. So the [^a-zA-Z0-9]will mean in fact all the characters that are not letters or digits. In addition, here the meta-characters will lose the meta meaning and are explicit. You make a group composed of the start and the ^ like this: [*^].

    You can alternate options with the | character. Consider that you want to find files starting with the letters Hi and Av. The correct pattern to use is this: (Hi)|(Av). Here we use the parentheses to group a sequence of characters. Of course, you can mix all the previously-learned ones to create complex regular expressions. If you want to use one of the meta-characters in their explicit form inside a regular expression, add a character before them.

    What is left to learn is some more advanced building of the character. Some special characters after it have specific meanings. Consider the text below:

    In the Pestaring, the main staris staring rich men.

    For the first star, the matching pattern is BstarB, for the second it is bstarband and for the last bstarB. In a similar way, there is the empty character sequence at the start of the row (<) and at the end <. The w means the class: [a-zA-Z0-9] while the W the opposite.

    More Server Administration Articles
    More By Gabor Bernat

    blog comments powered by Disqus

    SERVER ADMINISTRATION ARTICLES

    - Server Responses to Client Communication
    - Authentication in Client/Server Communication
    - Client/Server Communication
    - Understanding Awk in the UNIX Shell
    - Stream Editor in the UNIX Shell
    - Processes in the UNIX Shell
    - Migrating from Windows to Wine
    - Wine: Not Another Emulator
    - Preventive Measures to Block SSH Attacks
    - Monitoring Temperatures with Cacti
    - Cacti: RRDTool-based Graphing Solution
    - Network Magic 5.0 Review
    - Netfilter and Iptables Overview
    - Installing and Configuring Squid
    - Clickfree PC Backup Systems Compared


    © 2003-2012 by Developer Shed. All rights reserved. DS Cluster 9 - Follow our Sitemap