Using PCREs - Wrapping Things Up
(Page 5 of 5 )
Here's the final version of the "catch-all" href regular expression:
/<a(\s)+[^>]*href(\s)*=(\s)*('|")*[^"'>]+('|")*[^>]*(\s)*>(.|\s)*?<\/a(\s)*>/i |
Finally, there's one more thing I didn't tell you that has caused me grief as well, and that is escaping ' and \ characters.
When you specify a regular expression, you're expressing a string, so of course you put it in either single or double quotes. I recommend you always use SINGLE quotes, because you only need to escape the single quote character itself. So my final href regex:
/<a(\s)+[^>]*href(\s)*=(\s)*('|")*[^"'>]+('|")*[^>]*(\s)*>(.|\s)*?<\/a(\s)*>/i |
becomes this:
'/<a(\s)+[^>]*href(\s)*=(\s)*(\'|")*[^"\'>]+(\'|")*[^>]*(\s)*>(.|\s)*?<\/a(\s)*>/i' |
even though the compiler is going to "see" the first string.
Well, I had promised you that this was complex, but hopefully if you've taken some time and thought on this tutorial, you'll be less likely to be surprised when developing these expressions. Oh, and refer a friend to my regular expression tester!
About the AuthorSam Fullman is a web designer, database consultant and programmer. He is currently working on the RelateBase.com project, a new venture which will allow end users, IT people, and webmasters to set up fully relational databases. For more information go to www.relatebase.com or the parent website, www.compasspointmedia.com
| DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware. |