Spec-Zone .ru
спецификации, руководства, описания, API

10.4.1. Collation Implementation Types

MySQL implements several types of collations:

Simple collations for 8-bit character sets

This kind of collation is implemented using an array of 256 weights that defines a one-to-one mapping from character codes to weights. latin1_swedish_ci is an example. It is a case-insensitive collation, so the uppercase and lowercase versions of a character have the same weights and they compare as equal.

mysql> SET NAMES 'latin1' COLLATE
        'latin1_swedish_ci';Query OK, 0 rows affected (0.01 sec)mysql> SELECT HEX(WEIGHT_STRING('a')), HEX(WEIGHT_STRING('A'));+-------------------------+-------------------------+| HEX(WEIGHT_STRING('a')) | HEX(WEIGHT_STRING('A')) |+-------------------------+-------------------------+| 41                      | 41                      |+-------------------------+-------------------------+1 row in set (0.01 sec)mysql> SELECT 'a' = 'A';+-----------+| 'a' = 'A' |+-----------+|         1 |+-----------+1 row in set (0.12 sec)

For implementation instructions, see Section 10.4.3, "Adding a Simple Collation to an 8-Bit Character Set".

Complex collations for 8-bit character sets

This kind of collation is implemented using functions in a C source file that define how to order characters, as described in Section 10.3, "Adding a Character Set".

Collations for non-Unicode multi-byte character sets

For this type of collation, 8-bit (single-byte) and multi-byte characters are handled differently. For 8-bit characters, character codes map to weights in case-insensitive fashion. (For example, the single-byte characters 'a' and 'A' both have a weight of 0x41.) For multi-byte characters, there are two types of relationship between character codes and weights:

For implementation instructions, see Section 10.3, "Adding a Character Set".

Collations for Unicode multi-byte character sets

Some of these collations are based on the Unicode Collation Algorithm (UCA), others are not.

Non-UCA collations have a one-to-one mapping from character code to weight. In MySQL, such collations are case insensitive and accent insensitive. utf8_general_ci is an example: 'a', 'A', 'À', and 'á' each have different character codes but all have a weight of 0x0041 and compare as equal.

mysql> SET NAMES 'utf8' COLLATE
        'utf8_general_ci';Query OK, 0 rows affected (0.00 sec)mysql> CREATE
        TABLE t1    -> (c1 CHAR(1) CHARACTER SET UTF8 COLLATE
        utf8_general_ci);Query OK, 0 rows affected (0.01 sec)mysql> INSERT
        INTO t1 VALUES ('a'),('A'),('À'),('á');Query OK, 4 rows affected (0.00 sec)Records: 4  Duplicates: 0  Warnings: 0mysql> SELECT c1, HEX(c1), HEX(WEIGHT_STRING(c1)) FROM t1;+------+---------+------------------------+| c1   | HEX(c1) | HEX(WEIGHT_STRING(c1)) |+------+---------+------------------------+| a    | 61      | 0041                   || A    | 41      | 0041                   || À    | C380    | 0041                   || á    | C3A1    | 0041                   |+------+---------+------------------------+4 rows in set (0.00 sec)

UCA-based collations in MySQL have these properties:

A many-characters-to-many-weights mapping is also possible (this is contraction with expansion), but is not supported by MySQL.

For implementation instructions, for a non-UCA collation, see Section 10.3, "Adding a Character Set". For a UCA collation, see Section 10.4.4, "Adding a UCA Collation to a Unicode Character Set".

Miscellaneous collations

There are also a few collations that do not fall into any of the previous categories.