Working with Strings - String Properties continued
(Page 3 of 4 )
The following example illustrates how collation affects sort order. Suppose that a table contains a latin1 string column and has the following rows:
mysql> CREATE TABLE t (c CHAR(3) CHARACTER SET latin1);
mysql> INSERT INTO t (c) VALUES('AAA'),('bbb'),('aaa'),('BBB');
mysql> SELECT c FROM t;
+------+
| c |
+-------+
| AAA |
| bbb |
| aaa |
| BBB |
+-------+
By applying the COLLATE operator to the column, you can choose which collation to use for sorting and thus affect the order of the result:
A case-insensitive collation sorts a and A together, placing them before b and B. However, for a given letter, it does not necessarily order one lettercase before another, as shown by the following result:
mysql> SELECT c FROM t ORDER BY c COLLATE latin1_swedish_ci;
+------+
| c |
+-------+
| AAA |
| aaa |
| bbb |
| BBB |
+-------+
A case-sensitive collation puts A and a before B and b, and sorts uppercase before lowercase:
mysql> SELECT c FROM t ORDER BY c COLLATE latin1_general_cs;
+------+
| c |
+-------+
| AAA |
| aaa |
| BBB |
| bbb |
+-------+
A binary collation sorts characters using their numeric values. Assuming that uppercase letters have numeric values less than those of lowercase letters, a binary collation results in the following ordering:
mysql> SELECT c FROM t ORDER BY c COLLATE latin1_bin;
+------+
| c |
+-------+
| AAA |
| BBB |
| aaa |
| bbb |
+-------+
Note that, because characters in different lettercases have different numeric values, a binary collation produces a case-sensitive ordering. However, the order is different than that for the case-sensitive collation.
You can choose a language-specific collation if you require that comparison and sorting operations use the sorting rules of a particular language. For example, if you store strings using the utf8 character set, the default collation (utf8_general_ci) treats ch and ll as two-character strings. If you need the traditional Spanish ordering that treats ch and ll as single characters that follow c and l, respectively, use the utf8_spanish2_ci collation. The two collations produce different results, as shown here:
mysql> CREATE TABLE t (c CHAR(2) CHARACTER SET utf8);
mysql> INSERT INTO t (c) VALUES('cg'),('ch'),('ci'),('lk'),('ll'),('lm');
mysql> SELECT c FROM t ORDER BY c COLLATE utf8_general_ci;
+------+
| c |
+-------+
| cg |
| ch |
| ci |
| lk |
| ll |
| lm |
+-------+
mysql> SELECT c FROM t ORDER BY c COLLATE utf8_spanish2_ci;
+------+
| c |
+-------+
| cg |
| ci |
| ch |
| lk |
| lm |
| ll |
+-------+
Next: 5.2 Choosing a String Data Type >>
More Database Articles Articles
More By O'Reilly Media
|
This article is excerpted from chapter five of the MySQL Cookbook, Second Edition, written by Paul DuBois (O'Reilly; ISBN: 059652708X). Check it out today at your favorite bookstore. Buy this book now.
|
|