Miscellaneous
  Home arrow Miscellaneous arrow Page 3 - Using PCREs
Codewalker Forums 
  Tutorials  
Database Articles  
Miscellaneous  
Navigation Usability  
PEAR Articles  
Programming Basics  
Server Administration  
XML Tutorials  
  Reviews  
Database Book Reviews  
Linux Book Reviews  
Miscellaneous Reviews  
PHP Book Reviews  
PHP Software Reviews  
Server Admin Reviews  
SQL Tool Reviews  
  Code Gallery  
Content Management Code  
Contest Code  
Counters Code  
Database Code  
Date Time Code  
Discussion Board Code  
Email Code  
File Manipulation Code  
GUI Code  
Link Farm Code  
Miscellaneous Code  
Search Code  
Site Navigation Code  
User Management Code  
Mobile Linux 
App Generation ROI 
IBM® developerWorks 
Download TestComplete 
Forums Sitemap 
Weekly Newsletter 
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
MISCELLANEOUS

Using PCREs
By: Codewalkers
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 1
    2002-11-16

    Table of Contents:
  • Using PCREs
  • Section 1
  • Section 2
  • Section 3
  • Wrapping Things Up

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Using PCREs - Section 2


    (Page 3 of 5 )

    The above five examples are the best way of introducing the rules I'm about to give. Just as a reminder, I have set up a regular expression tester at:http://samuelfullman.com/team/php/tools/regular_expression_tester_p.php

    This tester is great because you can build a string, then either paste in text to search for the strings, OR you can specify a URL on the Web. Here's Sam's Rule #2 of Regular Expressions:

    BUILD YOUR REGULAR EXPRESSIONS UP STEP BY STEP, TESTING VARIATIONS OF SEARCHED STRINGS AT EVERY STEP.

    The tester I desgined will allow you to do that.

    Let's go back to the simple href examples above:

    Case 1
    <A HREF = http://compasspointmedia.com>Click here</a>

    Again, this is perfectly valid in any browser. The problem is that we have spaces. We could also have tabs or newline characters.

    Enter Sam's Rule #3 of Regular Expressions:

    ALWAYS COMPENSATE AND ACCOUNT FOR WHITESPACE!

    As you may know, browsers don't show whitespace, and a series of more than one space character is ignored. In Perl Regexes, whitespace characters (characters chr(9),chr(10),chr(13) and the space) are designated by \s. So let's rewrite our regex to handle this:

    /<a(\s)+href(\s)*=(\s)*"[^"]+"(\s)*>.*<\/a(\s)*>/i

    I've added (\s) where whitespace could conceivably be in the string. Notice that after the first <a, there must be at least one whitespace character, hence the + sign afterwards. The whitespace in the </a> tag is unlikely but again, it's legal for browsers and we want to account for its possible presence.

    Case 2
    &lt;a name="link" href="http://compasspointmedia.com"&gt;Click here&lt;/a&gt;

    This is pretty obvious; attributes don't have to be in any order. Great for writing HTML, hard for regexes. You have to think strategically on this one. Here's how we add in for this:

    /&lt;a(\s)+[^&gt;]*href(\s)*=(\s)*"[^"]+"[^&gt;]*(\s)*&gt;.*&lt;\/a(\s)*&gt;/i

    Basically, I've added this string [^>]*, which means, in English, "anything except for a close bracket (>) character, zero or any number of times. In other words, the first (>) closes the href tag, so that would mean we're no longer in the tag. Since we don't know where the href attribute will be declared in the string, this works.

    A little thought will tell you that if we were requiring TWO attributes like href AND name, it might get a little ugly. The regex for this is fallible but will get most cases where we need both. This is WAY complex. You can skip this next one if you want but here it is:

    /&lt;a(\s)+[^&gt;]*

    ((href(\s)*=(\s)*"[^"]+")|(name(\s)*=(\s)*"[^"]+")|([^&gt;]*)){2,}

    (\s)*&gt;.*&lt;\/a(\s)*&gt;/i

    More Miscellaneous Articles
    More By Codewalkers


       · Thank you for this tut. I read over the section on regular expressions in the PHP...
       · There was I wondering how to get HTML tags out of a chunk of text (and wondering why...
     

    MISCELLANEOUS ARTICLES

    - Install Slackware on Your Old PC
    - Firefox Plugins You`re Not Using (and Should...
    - Working with MP3 ID3 Tags in FTP Server Usin...
    - How Switching to Linux Can Make Your Computi...
    - Set Up Your Home Office on Linux: a Guide fo...
    - Putty File Transfer Commands in SSH Protocol
    - Setting Up Ubuntu for Your Home Office
    - Installing Mint Linux
    - Crucial Traits of Awk
    - Using PHP to Stream MP3 Files and Prevent Il...
    - 10 Must Have Firefox Improvements
    - All About OpenOffice 3.0
    - Shell Script Writing
    - Loops in the UNIX Shell
    - The Test in the UNIX Shell





    © 2003-2010 by Developer Shed. All rights reserved. DS Cluster 5 Hosted by Hostway
    For more Enterprise Application Development news, visit eWeek