One property of a string is whether it is binary or nonbinary:
A binary string is a sequence of bytes. It can contain any type of information, such as images, MP3 files, or compressed or encrypted data. A binary string is not associated with a character set, even if you store a value such as abc that looks like ordinary text. Binary strings are compared byte by byte using numeric byte values.
A nonbinary string is a sequence of characters. It stores text that has a particular character set and collation. The character set defines which characters can be stored in the string. The collation defines the comparison and sorting properties of the characters.
A characteristic of nonbinary strings is that they have a character set. To see which character sets are available, use this statement:
mysql> SHOW CHARACTER SET;
Charset
Description
Default collation
Maxlen
big5
Big5 Traditional Chinese
big5_chinese_ci
2
dec8
DEC West European
dec8_swedish_ci
1
cp850
DOS West European
cp850_general_ci
1
hp8
HP West European
hp8_english_ci
1
koi8r
KOI8-R Relcom Russian
koi8r_general_ci
1
latin1
cp1252 West European
latin1_swedish_ci
1
latin2
ISO 8859-2 Central European
latin2_general_ci
1
...
utf8
UTF-8 Unicode
utf8_general_ci
3
ucs2
UCS-2 Unicode
ucs2_general_ci
2
...
The default character set in MySQL is latin1. If you need to store characters from several languages in a single column, consider using one of the Unicode character sets (utf8 or ucs2) because they can represent characters from multiple languages.
Some character sets contain only single-byte characters, whereas others allow multibyte characters. For some multibyte character sets, all characters have a fixed length. Others contain characters of varying lengths. For example, Unicode data can be stored using the ucs2 character set in which all characters take two bytes or the utf8 character set in which characters take from one to three bytes.
You can determine whether a given string contains multibyte characters using the LENGTH() and CHAR_LENGTH() functions, which return the length of a string in bytes and characters, respectively. If LENGTH() is greater than CHAR_LENGTH() for a given string, multibyte characters are present.
For the ucs2 Unicode character set, all characters are encoded using two bytes, even if they might be single-byte characters in another character set such as latin1. Thus, every ucs2 string contains multibyte characters:
The utf8 Unicode character set has multibyte characters, but a given utf8 string might contain only single-byte characters, as in the following example:
Another characteristic of nonbinary strings is collation, which determines the sort order of characters in the character set. Use SHOW COLLATION to see which collations are available; add a LIKE clause to see the collations for a particular character set:
mysql> SHOW COLLATION LIKE 'latin1%';
Collation
Charset
Id
Default
Compiled
Sortlen
latin1_german1_ci
latin1
5
Yes
1
latin1_swedish_ci
latin1
8
Yes
Yes
1
latin1_danish_ci
latin1
15
Yes
1
latin1_german2_ci
latin1
31
Yes
2
latin1_bin
latin1
47
Yes
1
latin1_general_ci
latin1
48
Yes
1
latin1_general_cs
latin1
49
Yes
1
latin1_spanish_ci
latin1
94
Yes
1
In contexts where no collation is indicated, the collation with Yes in the Default column is the default collation used for strings in the given character set. As shown, the default collation for latin1 is latin1_swedish_ci. (Default collations are also displayed by SHOW CHARACTER SET.)
A collation can be case-sensitive (a and A are different), case-insensitive (a and A are the same), or binary (two characters are the same or different based on whether their numeric values are equal). A collation name ending in ci, cs, or bin is case-insensitive, case-sensitive, or binary, respectively.
A binary collation provides a sort order for nonbinary strings that is something like the order for binary strings, in the sense that comparisons for binary strings and binary collations both use numeric values. However, there is a difference: binary string comparisons are always based on single-byte units, whereas a binary collation compares nonbinary strings using character numeric values; depending on the character set, some of these might be multibyte values.