PHP Strings Primer - Dealing with HTML Tags and Entities
(Page 19 of 37 )
Making sure you properly handle user input with regards to HTML tags and entities is crucial. If you do not handle these special characters properly, you will end up with your web pages looking far different than you had planned and there is the possibility of code arbitrarily being executed on your server. The major vulnerability here is with something called cross site scripting, or XSS. This can allow a person to cause some action to occur from your web site that you did not intend. A common exploit of XSS is to steal the cookies your site issues to users. It is, therefore, very important that we properly handle user input.
In PHP, we have a couple of different options on how to deal with these situations. First, we can simply strip the tags out of the data. Or, rather than removing the tags, we can change the characters in the tags to their HTML entity equivalents so that we can display them.
Removing the tags
In some situations, any type of HTML or PHP tag is simply unacceptable. If you plan to display one user's input to other users on your web site, it is advisable that you remove HTML tags from the input. With the 'strip_tags()' function we can easily remove any and all tags from a string. This function also has an optional second parameter to specify tags that should be allowed. First, let's take a look at an example where will strip all tags from a string.
The three different tags we used within the string have all been stripped out. When we don't specify the second parameter for 'strip_tags()', it throws caution to the wind and removes anything that resembles a tag.
There are cases where certain tags might be acceptable. In the case of the example above, we might allow the '<b>' and the '<i> 'tag. To do that, we would simply pass the 'strip_tags()' function the second parameter as a string containing the acceptable tags.
PHP provides a couple of different methods for changing characters to their HTML entity equivalents. This allows us to change the characters used in HTML and PHP tags into a form that we can display without the tags being interpreted. In some cases such as a forum where users share code, this is preferable to stripping the tags out.
There are two different functions we can use to translate characters into their HTML entity equivalents. The first, 'htmlentities()', will translate all characters which have a HTML entity equivalent. For most applications, this is overkill. The only characters we normally need to worry about are the ones that the second function, 'htmlspecialchars()', translates.
The 'htmlspecialchars' function will translate the following characters:
& (ampersand) into &
" (double quote) into "
< (less than) into <
> (greater than) into >
Let's take a look at an example and see what it will translate the tags into.