Database Articles
  Home arrow Database Articles arrow Page 2 - Working with Strings
Codewalker Forums 
  Tutorials  
Database Articles  
Miscellaneous  
Navigation Usability  
PEAR Articles  
Programming Basics  
Server Administration  
XML Tutorials  
  Reviews  
Database Book Reviews  
Linux Book Reviews  
Miscellaneous Reviews  
PHP Book Reviews  
PHP Software Reviews  
Server Admin Reviews  
SQL Tool Reviews  
  Code Gallery  
Content Management Code  
Contest Code  
Counters Code  
Database Code  
Date Time Code  
Discussion Board Code  
Email Code  
File Manipulation Code  
GUI Code  
Link Farm Code  
Miscellaneous Code  
Search Code  
Site Navigation Code  
User Management Code  
Mobile Linux 
App Generation ROI 
IBM® developerWorks 
Download TestComplete 
Forums Sitemap 
Weekly Newsletter 
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
DATABASE ARTICLES

Working with Strings
By: O'Reilly Media
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 3 stars3 stars3 stars3 stars3 stars / 2
    2007-08-24

    Table of Contents:
  • Working with Strings
  • 5.1 String Properties
  • String Properties continued
  • 5.2 Choosing a String Data Type

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Working with Strings - 5.1 String Properties


    (Page 2 of 4 )

    One property of a string is whether it is binary or nonbinary:

    1. A binary string is a sequence of bytes. It can contain any type of information, such as images, MP3 files, or compressed or encrypted data. A binary string is not associated with a character set, even if you store a value such as abc that looks like ordinary text. Binary strings are compared byte by byte using numeric byte values.
    2. A nonbinary string is a sequence of characters. It stores text that has a particular character set and collation. The character set defines which characters can be stored in the string. The collation defines the comparison and sorting properties of the characters.

    A characteristic of nonbinary strings is that they have a character set. To see which character sets are available, use this statement:

      mysql> SHOW CHARACTER SET;

    Charset

    Description

    Default collation

    Maxlen 

    big5

    Big5 Traditional Chinese

    big5_chinese_ci

    2

    dec8

    DEC West European

    dec8_swedish_ci

    1

    cp850

    DOS West European

    cp850_general_ci

    1

    hp8

    HP West European

    hp8_english_ci

    1

    koi8r

    KOI8-R Relcom Russian

    koi8r_general_ci

    1

    latin1

    cp1252 West European

    latin1_swedish_ci

    1

    latin2

    ISO 8859-2
    Central
    European

    latin2_general_ci

    1

    ...

    utf8

    UTF-8 Unicode

    utf8_general_ci

    3

    ucs2

    UCS-2 Unicode

    ucs2_general_ci

    2

    ...

    The default character set in MySQL is latin1. If you need to store characters from several languages in a single column, consider using one of the Unicode character sets (utf8 or ucs2) because they can represent characters from multiple languages.

    Some character sets contain only single-byte characters, whereas others allow multibyte characters. For some multibyte character sets, all characters have a fixed length. Others contain characters of varying lengths. For example, Unicode data can be stored using the ucs2 character set in which all characters take two bytes or the utf8 character set in which characters take from one to three bytes.

    You can determine whether a given string contains multibyte characters using the LENGTH() and CHAR_LENGTH() functions, which return the length of a string in bytes and characters, respectively. If LENGTH() is greater than CHAR_LENGTH() for a given string, multibyte characters are present.

    • For the ucs2 Unicode character set, all characters are encoded using two bytes, even if they might be single-byte characters in another character set such as latin1. Thus, every ucs2 string contains multibyte characters:

        mysql> SET @s = CONVERT('abc' USING ucs2);
        mysql> SELECT LENGTH(@s), CHAR_LENGTH(@s);
        +------------+-----------------
      +
        | LENGTH(@s)   | CHAR_LENGTH(@s)   |
        +--------------+-------------------+
        |            6 |                 3 |
        +--------------+-------------------+
       
    • The utf8 Unicode character set has multibyte characters, but a given utf8 string might contain only single-byte characters, as in the following example:

        mysql> SET @s = CONVERT('abc' USING utf8);
       mysql> SELECT LENGTH(@s), CHAR_LENGTH(@s);
      +------------+-----------------
      +
      | LENGTH(@s)   | CHAR_LENGTH(@s)   |
      +--------------+-------------------+
      |            3 |                 3 |
      +--------------+-------------------+

    Another characteristic of nonbinary strings is collation, which determines the sort order of characters in the character set. Use SHOW COLLATION to see which collations are available; add a LIKE clause to see the collations for a particular character set:

      mysql> SHOW COLLATION LIKE 'latin1%';

    Collation

    Charset

    Id

    Default 

    Compiled  

     Sortlen

    latin1_german1_ci 

    latin1 

    5

     

     Yes

    1 

    latin1_swedish_ci 

    latin1 

    8  

    Yes

    Yes

    1

    latin1_danish_ci

    latin1

    15

     

    Yes

    latin1_german2_ci

    latin1

    31

     

    Yes

    2 

    latin1_bin

    latin1

    47

     

    Yes

    latin1_general_ci

    latin1

    48

     

    Yes

    latin1_general_cs

    latin1

    49

     

    Yes 

    1

    latin1_spanish_ci

    latin1

    94

     

    Yes

    1

    In contexts where no collation is indicated, the collation with Yes in the Default column is the default collation used for strings in the given character set. As shown, the default collation for latin1 is latin1_swedish_ci. (Default collations are also displayed by SHOW CHARACTER SET.)

    A collation can be case-sensitive (a and A are different), case-insensitive (a and A are the same), or binary (two characters are the same or different based on whether their numeric values are equal). A collation name ending in ci, cs, or bin is case-insensitive, case-sensitive, or binary, respectively.

    A binary collation provides a sort order for nonbinary strings that is something like the order for binary strings, in the sense that comparisons for binary strings and binary collations both use numeric values. However, there is a difference: binary string comparisons are always based on single-byte units, whereas a binary collation compares nonbinary strings using character numeric values; depending on the character set, some of these might be multibyte values.

    More Database Articles Articles
    More By O'Reilly Media


     

    Buy this book now. This article is excerpted from chapter five of the MySQL Cookbook, Second Edition, written by Paul DuBois (O'Reilly; ISBN: 059652708X). Check it out today at your favorite bookstore. Buy this book now.

    DATABASE ARTICLES ARTICLES

    - More on Query Optimization for Oracle Databa...
    - Query Optimization in Oracle
    - Clusters and Other Data Structures for Oracle
    - Using Indexes with an Oracle Database
    - The Basics of Data Structures in Oracle
    - Oracle Data Structures
    - Best Practices for PL/SQL Variables
    - What`s Code Without Variables?
    - Clauses, Sorting, and SQL Queries
    - The From Clause and SQL Queries
    - Query Primer
    - Full Text Searches and Strings
    - Searching with Strings
    - Pattern Matching with Strings
    - Working with Cases of Strings





    © 2003-2010 by Developer Shed. All rights reserved. DS Cluster 3 Hosted by Hostway
    For more Enterprise Application Development news, visit eWeek