We've already briefly hinted at a couple examples of where validation would be important, once while discussing secure connections in the "Privacy" section, and again while discussing php.ini (register_globals) in the previous section. The following examples will show different situations of where your applications might be vulnerable, and hopefully they will help you to realize why it is so important to validate information.
SQL Injection
If you don't know what SQL injection is, consider the following login function:
<?php function login($user, $pass) { if(isset($user, $pass)) { $result = mysql_query("SELECT * FROM my_users WHERE username = '$user' AND password = '$pass'"); if(!$result) { die('Login failed.'); } else { $_SESSION['login'] = 'successful'; return TRUE; } } } ?>
Looks pretty straight forward, right? Assuming we handle the actual database connection elsewhere of course. The problem here is that we're passing the variables $user and $pass directly into an SQL statement without performing any type of validation on them. Suppose somebody entered the following:
username: somebody_elses_username password: ' OR 'x' = 'x
Now, since those variables aren't being validated the SQL suddenly becomes:
SELECT * FROM my_users WHERE username = 'somebody_elses_username' AND password = '' OR 'x' = 'x'
There you have it, that account has now been compromised. If that account has administrative privileges then the problem for you is even worse. There are many different ways you can alter a query using injections similar to the above, and even other variations of injection that might use MySQL wildcards instead of quotes. To avoid these types of attacks, always validate values being sent to SQL queries. The manual recommends using mysql_real_escape_string() for MySQL databases.
Cross-Site Scripting
Cross-Site Scripting (or XSS) is possible when your application allows users to directly or even inadvertently insert client-side languages such as HTML or JavaScript without any type of encoding. The simplest example would probably be someone submitting the text "</td>" into your forum, guestbook, comments or what have you. If your page is made up of tables, and this text is not encoded (and therefore interpreted by your browser along with the rest of your HTML), your page will now appear broken to anyone who visits. It may sound pretty trivial but can actually be extremely dangerous. JavaScript especially, can be used to obtain cookie/session information, redirect users to a third-party malicious website (which might redirect again back to the original website so quickly that the user never even notices). An attacker might insert a large chunk of code into your page, perhaps a div which overlays your entire template with a page of their own (possibly matching your template exactly), with a form that requests and submits sensitive data to the attackers server. That would qualify as a pretty darn clever phishing attack as even a web-savvy visitor who pays close attention to what url they're on, never actually leaves your domain until it's too late.
IMG tags are a prime target for inserting malicious data, because every IMG tag encountered performs a separate request from the page being loaded. In other words, the IMG tag will make a valid HTTP request to pretty much any file regardless of whether it's an image or not. For example, suppose you're calling images dynamically as follows:
If $url is not validated, then an attacker might be able to insert JavaScript, or even the url to a PHP file on their own server which would be executed in exactly the same way as if a user typed the JavaScript or attacker’s url directly into their browser. Fortunately XSS attacks are pretty easy to avoid. PHP's htmlentities() function does a really good job of encoding user input for you. If you want to allow users to use certain markup or formatting, using BBcode tags such as [url][/url] is much safer than allowing them to use straight HTML.
Hidden fields & forms
This should be obvious, but when creating hidden form fields please keep in mind that whatever values you assign to these fields really aren't hidden. Anyone with basic HTML knowledge knows how to view the source code of your page, and they also know that they can create their own version of your form on any computer with altered fields/values.
One popular mistake is when people assign the e-mail address of where a form is to be submitted in a hidden field. It's a mistake because I could create my own form assigned to your form processing script, along with a little additional code to cycle through collected e-mail addresses and I'd be able to spam to my hearts desire using your mail server.
Another popular mistake is when a developer simply removes a form because they no longer want people submitting anything (for example, to disable comments). Now, just because the HTML form isn't there, does that mean I can't write my own form, or a script to send the POST data through a socket connection to your server? Your processing script is still there, so I don't see why not.
Cookies
Another aspect that many developers tend to overlook is cookies. It is extremely easy to view and edit cookies stored in your browsers cache. Obviously, storing sensitive information in a cookie, or considering it a trusted resource and therefore not validating its contents is bad practice. Need I say more?
Query Strings
As mentioned back in the privacy section, using periods and slashes in a url (ie, ../../) is one way an attacker might try to snoop around your folder system. If you have any type of variables being passed in the url that use folder or file names, then you should take extra care to make sure you're watching out for this trick.
Normally when calling a variable from the url, you would either use the $_GET or $_REQUEST arrays. If you have register_globals enabled, then you wouldn't need to use either as the variable is automatically assigned. It is still possible to write secure code with register_globals enabled, however I won't bother detailing how as it has been deprecated and therefore is not recommended (deprecated meaning: don't be surprised if it disappears all together in future versions).
Either way, with or without register_globals, if you're not validating your variables then anyone altering values in the url may be able to do something you never even thought of. I find the best approach to this is, rather than writing code to watch out for 100 different things you don't want, simply write code that only allows what you do want (comparable to a white-list). It may be a slightly more restrictive, and less user-friendly approach but it is better to be safe than sorry.
For example, if you’re expecting an alpha-numeric value with possible dashes and/or periods, try the following:
<?php $allowed = array('-', '.'); if (ctype_alnum(str_replace($allowed, '', $_GET['string']))) { // the submitted string is valid } ?>
In the above code, we first define an array of any obscure characters we want to allow. Then we remove those characters from the submitted string using str_replace(), and check to see if it has an alpha-numeric value by using the ctype_alnum() function.
It is also important that you use $_GET or $_REQUEST strictly for variables being passed in the url, and $_POST when grabbing data from forms. Some people find it convenient to simply use $_REQUEST, as it is a combination of $_GET, $_POST and $_COOKIE, however this could leave your application vulnerable to CSRF attacks which will be described later in the Sociology section. For now, keep in mind that each of these arrays has a specific purpose, and you should always use them correctly.
Spoofing E-Mail Headers
PHP's built-in mail() function is worth mentioning as this along with other functions might be easy to overlook. The purpose of this function is to send mail, and it accepts certain values in order to do so. What you might not realize is if those values are not validated properly, then an attacker might be able to slip in extra headers (completely legal as far as the function is concerned) to perhaps spam from your mail server or otherwise perform tasks that they shouldn't be allowed to do. You might for example use the following code to submit your mail form:
Whoops, you just ended up spamming 3 people. Of course there are many similar attacks and I couldn't possibly give examples for all of them. What is important is that you understand how functions like these work, that they accept special characters, what those characters are and how to filter them out of user submitted input.
Type casting (which can be looked up in the manual) could also be helpful in validating data.