Spec-Zone .ru
спецификации, руководства, описания, API
|
In the great majority of statements, it is obvious what collation MySQL uses to resolve a comparison operation.
For example, in the following cases, it should be clear that the collation is the collation of column charset_name
:
SELECT x FROM T ORDER BY x;SELECT x FROM T WHERE x = x;SELECT DISTINCT x FROM T;
However, with multiple operands, there can be ambiguity. For example:
SELECT x FROM T WHERE x = 'Y';
Should the comparison use the collation of the column x
, or of the string literal
'Y'
? Both x
and 'Y'
have
collations, so which collation takes precedence?
Standard SQL resolves such questions using what used to be called "coercibility" rules. MySQL assigns coercibility values as follows:
An explicit COLLATE
clause has a coercibility of 0.
(Not coercible at all.)
The concatenation of two strings with different collations has a coercibility of 1.
The collation of a column or a stored routine parameter or local variable has a coercibility of 2.
A "system constant" (the
string returned by functions such as USER()
or VERSION()
) has a coercibility of 3.
The collation of a literal has a coercibility of 4.
NULL
or an expression that is derived from NULL
has a coercibility of 5.
MySQL uses coercibility values with the following rules to resolve ambiguities:
Use the collation with the lowest coercibility value.
If both sides have the same coercibility, then:
If both sides are Unicode, or both sides are not Unicode, it is an error.
If one of the sides has a Unicode character set, and another side has a non-Unicode character set, the side with Unicode character set wins, and automatic character set conversion is applied to the non-Unicode side. For example, the following statement does not return an error:
SELECT CONCAT(utf8_column, latin1_column) FROM t1;
It returns a result that has a character set of utf8
and
the same collation as utf8_column
. Values of latin1_column
are automatically converted to utf8
before concatenating.
For an operation with operands from the same character set but that mix
a _bin
collation and a _ci
or
_cs
collation, the _bin
collation
is used. This is similar to how operations that mix nonbinary and binary strings evaluate
the operands as binary strings, except that it is for collations rather than data types.
Although automatic conversion is not in the SQL standard, the SQL standard document does say that every character set is (in terms of supported characters) a "subset" of Unicode. Because it is a well-known principle that "what applies to a superset can apply to a subset," we believe that a collation for Unicode can apply for comparisons with non-Unicode strings.
Examples:
Comparison | Collation Used |
---|---|
column1 = 'A' |
Use collation of column1 |
column1 = 'A' COLLATE x |
Use collation of 'A' COLLATE x |
column1 COLLATE x = 'A' COLLATE y |
Error |
The COERCIBILITY()
function can be used to determine the coercibility of a string expression:
mysql>SELECT COERCIBILITY('A' COLLATE latin1_swedish_ci);
-> 0mysql>SELECT COERCIBILITY(VERSION());
-> 3mysql>SELECT COERCIBILITY('A');
-> 4
See Section 12.14, "Information Functions".
For implicit conversion of a numeric or temporal value to a string, such as occurs for the argument 1
in the expression CONCAT(1, 'abc')
, the result is a character (nonbinary) string that has a
character set and collation determined by the character_set_connection
and collation_connection
system variables. See Section
12.2, "Type Conversion in Expression Evaluation".