Miscellaneous

  Home arrow Miscellaneous arrow Page 2 - Using PCREs
MISCELLANEOUS

Using PCREs
By: Codewalkers
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 1
    2002-11-16

    Table of Contents:
  • Using PCREs
  • Section 1
  • Section 2
  • Section 3
  • Wrapping Things Up

  •  
     

    SEARCH CODEWALKERS

    TOOLS YOU CAN USE

    advertisement

    Using PCREs - Section 1


    (Page 2 of 5 )

    OK, so hang on to your hats. Take for example trying to find a hyperlink on a web page (the <href> tag). Here is an href that is in its simplest form:

    &lt;a href="http://compasspointmedia.com"&gt;click here&lt;/a&gt;

    And here is the minimum regular expression that would find this using PCRE's:

    /&lt;a href="[^"]+"&gt;.*&lt;\/a&gt;/i

    You'll notice the "wrapper" of slashes and the 'i' on the end, that is /....../i. This is how it's done in PERL. The i stands for case-insensitive, by the way. You actually aren't constrained to use a '/' as your delimiter, but I usually do. Since I use a forward slash as my wrapper, I must "escape" any forward slash character inside the delimiters with a backslash, like this: \/, so the compiler doesn't think it's the end.

    Now, the brackets [] enclose a character or set of characters, and ^ in this case means NOT or EXCEPT FOR, so this part [^"]+ means "any character except a double quote, at least once." This covers the opening href tag. Then we specify any character (.), and the * means "zero to infinity times". Finally we want the closing tag (<\/a>).

    Again, the "i" at the end means a case-insenstive search. Basically in English, this regex is saying the following: "Find an open tag, an "a" then a space, then an href=, then a double quote, then ANYTHING EXCEPT FOR A DOUBLE QUOTE, ANY NUMBER OF TIMES. Then find another double quote, then a close bracket. Then find any characters you want, but you have to end it with a close tag </a>". (whew!)

    Regex is cool! Reason: You don't have to know what the href is, OR what the text is (click here) for that matter. However, if I was looking for links on a web page, I would NEVER use the above string for finding an href. Here's why:

    Case 1
    &lt;a href = "http://compasspointmedia.com"&gt;click here&lt;/a&gt;

    Case 2
    &lt;a name="link" href="http://compasspointmedia.com"&gt;Click here&lt;/a&gt;

    Case 3
    &lt;a href='http://compasspointmedia.com'&gt;click here&lt;/a&gt;

    Case 4
    &lt;a href="http://compasspointmedia.com"&gt;
    click
    here
    or 
    anywhere in
    this paragraph
    &lt;/a&gt;

    Case 5
    &lt;A HREF=http://compasspointmedia.com&gt;Click here&lt;/a&gt;&lt;a href="http://amazon.com"&gt;go to amazon&lt;/a&gt;

    These are five examples of VALID hrefs (try them in a web page and see) that your browser would recognize, but which your regex string would not. So that brings me to Sam's Rule #1 of Regular Expressions:

    ALWAYS DESIGN YOUR REGULAR EXPRESSION TO BE AS SMART AS YOUR BROWSER!

    More Miscellaneous Articles
    More By Codewalkers

    blog comments powered by Disqus

    MISCELLANEOUS ARTICLES

    - Oracle Database XE: Indexes and Sequences
    - Modifying Tables in Oracle Database XE
    - Oracle Database XE: Tables and Constraints
    - More on Oracle Databases and Datatypes
    - Oracle Database XE Datatypes: Datetime and L...
    - Oracle Database XE Datatypes: Character and ...
    - From Databases to Datatypes
    - Firefox 3.6.6 Released with Improved Plug-in...
    - Attention Bloggers: WordPress 3.0 Now Releas...
    - Reflection in PHP 5
    - Inheritance and Other Advanced OOP Features
    - Advanced OOP Features
    - Linux from Scratch V.6.6 Review
    - Linux Gaining in Strength
    - Install Slackware on Your Old PC


    © 2003-2012 by Developer Shed. All rights reserved. DS Cluster 8 - Follow our Sitemap