The From Clause and SQL Queries (Page 1 of 4 )
In this second part of a three-part series aimed at helping you learn SQL, you'll learn how to remove duplicate rows of data from your query answers, and more. This article is excerpted from chapter three of the book
Learning SQL, written by Alan Beaulieu (O'Reilly; ISBN: 0596007272). Copyright © 2006 O'Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O'Reilly Media.
Removing Duplicates
In some cases, a query might return duplicate rows of data. For example, if you were to retrieve the ID’s of all customers that have accounts, you would see the following:
mysql> SELECT cust_id
-> FROM account;
+---------+
| cust_id |
+---------+
| 1 |
| 1 |
| 1 |
| 2 |
| 2 |
| 3 |
| 3 |
| 4 |
| 4 |
| 4 |
| 5 |
| 6 |
| 6 |
| 7 |
| 8 |
| 8 |
| 9 |
| 9 |
| 9 |
| 10 |
| 10 |
| 11 |
| 12 |
| 13 |
+---------+
24 rows in set (0.00 sec)
Since some customers have more than one account, you will see the same customer ID once for each account owned by that customer. What you probably want in this case is the distinct set of customers that have accounts, instead of seeing the customer ID for each row in theaccounttable. You can achieve this by adding the keyworddistinctdirectly after theselectkeyword, as the following shows:
mysql> SELECT DISTINCT cust_id
-> FROM account;
+---------+
| cust_id |
+---------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
| 11 |
| 12 |
| 13 |
+---------+
13 rows in set (0.01 sec)
The result set now contains 13 rows, one for each distinct customer, rather than 24 rows, one for each account.
If you do not want the server to remove duplicate data, or you are sure there will be no duplicates in your result set, you can specify the ALL keyword instead of specifying DISTINCT. However, the ALL keyword is the default and never needs to be explicitly named, so most programmers do not include ALL in their queries.
Remember that generating a distinct set of results requires the data to be sorted, which can be time consuming for large result sets. Don’t fall into the trap of using DISTINCT just to be sure there are no duplicates; instead, take the time to understand the data you are working with so you will know whether duplicates are possible.
Next: The from Clause >>
More Database Articles Articles
More By O'Reilly Media
|
This article is excerpted from chapter three of the book Learning SQL, written by Alan Beaulieu (O'Reilly; ISBN: 0596007272). Check it out today at your favorite bookstore. Buy this book now.
|
|