Miscellaneous
  Home arrow Miscellaneous arrow Page 2 - Using PCREs
Codewalker Forums 
  Tutorials  
Database Articles  
Miscellaneous  
Navigation Usability  
PEAR Articles  
Programming Basics  
Server Administration  
XML Tutorials  
  Reviews  
Database Book Reviews  
Linux Book Reviews  
Miscellaneous Reviews  
PHP Book Reviews  
PHP Software Reviews  
Server Admin Reviews  
SQL Tool Reviews  
  Code Gallery  
Content Management Code  
Contest Code  
Counters Code  
Database Code  
Date Time Code  
Discussion Board Code  
Email Code  
File Manipulation Code  
GUI Code  
Link Farm Code  
Miscellaneous Code  
Search Code  
Site Navigation Code  
User Management Code  
Mobile Linux 
App Generation ROI 
IBM® developerWorks 
Download TestComplete 
Forums Sitemap 
Weekly Newsletter 
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
MISCELLANEOUS

Using PCREs
By: Codewalkers
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 1
    2002-11-16

    Table of Contents:
  • Using PCREs
  • Section 1
  • Section 2
  • Section 3
  • Wrapping Things Up

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Using PCREs - Section 1


    (Page 2 of 5 )

    OK, so hang on to your hats. Take for example trying to find a hyperlink on a web page (the <href> tag). Here is an href that is in its simplest form:

    &lt;a href="http://compasspointmedia.com"&gt;click here&lt;/a&gt;

    And here is the minimum regular expression that would find this using PCRE's:

    /&lt;a href="[^"]+"&gt;.*&lt;\/a&gt;/i

    You'll notice the "wrapper" of slashes and the 'i' on the end, that is /....../i. This is how it's done in PERL. The i stands for case-insensitive, by the way. You actually aren't constrained to use a '/' as your delimiter, but I usually do. Since I use a forward slash as my wrapper, I must "escape" any forward slash character inside the delimiters with a backslash, like this: \/, so the compiler doesn't think it's the end.

    Now, the brackets [] enclose a character or set of characters, and ^ in this case means NOT or EXCEPT FOR, so this part [^"]+ means "any character except a double quote, at least once." This covers the opening href tag. Then we specify any character (.), and the * means "zero to infinity times". Finally we want the closing tag (<\/a>).

    Again, the "i" at the end means a case-insenstive search. Basically in English, this regex is saying the following: "Find an open tag, an "a" then a space, then an href=, then a double quote, then ANYTHING EXCEPT FOR A DOUBLE QUOTE, ANY NUMBER OF TIMES. Then find another double quote, then a close bracket. Then find any characters you want, but you have to end it with a close tag </a>". (whew!)

    Regex is cool! Reason: You don't have to know what the href is, OR what the text is (click here) for that matter. However, if I was looking for links on a web page, I would NEVER use the above string for finding an href. Here's why:

    Case 1
    &lt;a href = "http://compasspointmedia.com"&gt;click here&lt;/a&gt;

    Case 2
    &lt;a name="link" href="http://compasspointmedia.com"&gt;Click here&lt;/a&gt;

    Case 3
    &lt;a href='http://compasspointmedia.com'&gt;click here&lt;/a&gt;

    Case 4
    &lt;a href="http://compasspointmedia.com"&gt;
    click
    here
    or 
    anywhere in
    this paragraph
    &lt;/a&gt;

    Case 5
    &lt;A HREF=http://compasspointmedia.com&gt;Click here&lt;/a&gt;&lt;a href="http://amazon.com"&gt;go to amazon&lt;/a&gt;

    These are five examples of VALID hrefs (try them in a web page and see) that your browser would recognize, but which your regex string would not. So that brings me to Sam's Rule #1 of Regular Expressions:

    ALWAYS DESIGN YOUR REGULAR EXPRESSION TO BE AS SMART AS YOUR BROWSER!

    More Miscellaneous Articles
    More By Codewalkers


       · Thank you for this tut. I read over the section on regular expressions in the PHP...
       · There was I wondering how to get HTML tags out of a chunk of text (and wondering why...
     

    MISCELLANEOUS ARTICLES

    - Using PHP to Stream MP3 Files and Prevent Il...
    - 10 Must Have Firefox Improvements
    - All About OpenOffice 3.0
    - Shell Script Writing
    - Loops in the UNIX Shell
    - The Test in the UNIX Shell
    - Data Streams and the UNIX Shell
    - Control Mechanisms of the UNIX Shell
    - Variables Within the UNIX Shell
    - The Shell and UNIX
    - In Detail: UNIX File Systems
    - Rights Management in UNIX
    - UNIX File Systems
    - The Terminal in UNIX
    - Operating Systems and UNIX





    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 5 Hosted by Hostway
    For more Enterprise Application Development news, visit eWeek