In some cases, a query might return duplicate rows of data. For example, if you were to retrieve the ID’s of all customers that have accounts, you would see the following:
Since some customers have more than one account, you will see the same customer ID once for each account owned by that customer. What you probably want in this case is the distinct set of customers that have accounts, instead of seeing the customer ID for each row in theaccounttable. You can achieve this by adding the keyworddistinctdirectly after theselectkeyword, as the following shows:
The result set now contains 13 rows, one for each distinct customer, rather than 24 rows, one for each account.
If you do not want the server to remove duplicate data, or you are sure there will be no duplicates in your result set, you can specify the ALL keyword instead of specifying DISTINCT. However, the ALL keyword is the default and never needs to be explicitly named, so most programmers do not include ALL in their queries.
Remember that generating a distinct set of results requires the data to be sorted, which can be time consuming for large result sets. Don’t fall into the trap of using DISTINCT just to be sure there are no duplicates; instead, take the time to understand the data you are working with so you will know whether duplicates are possible.