Spec-Zone .ru
спецификации, руководства, описания, API
|
This section describes how _bin
collations for nonbinary strings differ from the
binary
"collation" for binary
strings.
Nonbinary strings (as stored in the CHAR
, VARCHAR
, and TEXT
data types) have a character set and collation. A given character set can have several collations, each of which
defines a particular sorting and comparison order for the characters in the set. One of these is the binary
collation for the character set, indicated by a _bin
suffix in the collation name.
For example, latin1
and utf8
have binary collations
named latin1_bin
and utf8_bin
.
Binary strings (as stored in the BINARY
, VARBINARY
,
and BLOB
data types) have no character set or collation in the sense that nonbinary strings do. (Applied to a binary
string, the CHARSET()
and COLLATION()
functions both
return a value of binary
.) Binary strings are sequences of bytes and the numeric
values of those bytes determine sort order.
The _bin
collations differ from the binary
collation
in several respects.
The unit for sorting and comparison. Binary strings are sequences of
bytes. Sorting and comparison is always based on numeric byte values. Nonbinary strings are sequences of
characters, which might be multi-byte. Collations for nonbinary strings define an ordering of the character
values for sorting and comparison. For the _bin
collation, this ordering is based
solely on binary code values of the characters (which is similar to ordering for binary strings except that a
_bin
collation must take into account that a character might contain multiple
bytes). For other collations, character ordering might take additional factors such as lettercase into account.
Character set conversion. A nonbinary string has a character set and
is converted to another character set in many cases, even when the string has a _bin
collation:
When assigning column values from another column that has a different character set:
UPDATE t1 SET utf8_bin_column=latin1_column;INSERT INTO t1 (latin1_column) SELECT utf8_bin_column FROM t2;
When assigning column values for INSERT
or UPDATE
using a string literal:
SET NAMES latin1;INSERT INTO t1 (utf8_bin_column) VALUES ('string-in-latin1');
When sending results from the server to a client:
SET NAMES latin1;SELECT utf8_bin_column FROM t2;
For binary string columns, no conversion occurs. For the preceding cases, the string value is copied byte-wise.
Lettercase conversion. Collations provide information about lettercase
of characters, so characters in a nonbinary string can be converted from one lettercase to another, even for
_bin
collations that ignore lettercase for ordering:
mysql>SET NAMES latin1 COLLATE latin1_bin;
Query OK, 0 rows affected (0.02 sec)mysql>SELECT LOWER('aA'), UPPER('zZ');
+-------------+-------------+| LOWER('aA') | UPPER('zZ') |+-------------+-------------+| aa | ZZ |+-------------+-------------+1 row in set (0.13 sec)
The concept of lettercase does not apply to bytes in a binary string. To perform lettercase conversion, the string must be converted to a nonbinary string:
mysql>SET NAMES binary;
Query OK, 0 rows affected (0.00 sec)mysql>SELECT LOWER('aA'), LOWER(CONVERT('aA' USING latin1));
+-------------+-----------------------------------+| LOWER('aA') | LOWER(CONVERT('aA' USING latin1)) |+-------------+-----------------------------------+| aA | aa |+-------------+-----------------------------------+1 row in set (0.00 sec)
Trailing space handling in comparisons. Nonbinary strings have PADSPACE
behavior for all collations, including _bin
collations. Trailing spaces are insignificant in comparisons:
mysql>SET NAMES utf8 COLLATE utf8_bin;
Query OK, 0 rows affected (0.00 sec)mysql>SELECT 'a ' = 'a';
+------------+| 'a ' = 'a' |+------------+| 1 |+------------+1 row in set (0.00 sec)
For binary strings, all characters are significant in comparisons, including trailing spaces:
mysql>SET NAMES binary;
Query OK, 0 rows affected (0.00 sec)mysql>SELECT 'a ' = 'a';
+------------+| 'a ' = 'a' |+------------+| 0 |+------------+1 row in set (0.00 sec)
Trailing space handling for inserts and retrievals. CHAR(
columns store nonbinary strings.
Values shorter than N
)N
characters are extended with spaces on
insertion. For retrieval, trailing spaces are removed.
BINARY(
columns store binary strings.
Values shorter than N
)N
bytes are extended with 0x00
bytes on insertion. For retrieval, nothing is removed; a value of the declared length is always returned.
mysql>CREATE TABLE t1 (
->a CHAR(10) CHARACTER SET utf8 COLLATE utf8_bin,
->b BINARY(10)
->);
Query OK, 0 rows affected (0.09 sec)mysql>INSERT INTO t1 VALUES ('a','a');
Query OK, 1 row affected (0.01 sec)mysql>SELECT HEX(a), HEX(b) FROM t1;
+--------+----------------------+| HEX(a) | HEX(b) |+--------+----------------------+| 61 | 61000000000000000000 |+--------+----------------------+1 row in set (0.04 sec)