Spec-Zone .ru
спецификации, руководства, описания, API
|
Every character string literal has a character set and a collation.
A character string literal may have an optional character set introducer and COLLATE
clause:
[_charset_name
]'string
' [COLLATEcollation_name
]
Examples:
SELECT 'string
';SELECT _latin1'string
';SELECT _latin1'string
' COLLATE latin1_danish_ci;
For the simple statement SELECT '
,
the string has the character set and collation defined by the string
'character_set_connection
and collation_connection
system variables.
The _
expression is formally
called an introducer. It tells the parser, "the string that is about to follow uses character set charset_name
X
." Because this has confused people in the past, we
emphasize that an introducer does not change the string to the introducer character set like CONVERT()
would do. It does not change the string's value, although padding
may occur. The introducer is just a signal. An introducer is also legal before standard hex literal and numeric
hex literal notation (x'
and
literal
'0x
), or before bit-field literal
notation (nnnn
b'
and literal
'0b
). nnnn
Examples:
SELECT _latin1 x'AABBCC';SELECT _latin1 0xAABBCC;SELECT _latin1 b'1100011';SELECT _latin1 0b1100011;
MySQL determines a literal's character set and collation in the following manner:
If both _X
and COLLATE
are specified, character set Y
X
and collation Y
are used.
If _X
is specified but COLLATE
is not specified, character set X
and its default collation are used. To see the default
collation for each character set, use the SHOW COLLATION
statement.
Otherwise, the character set and collation given by the character_set_connection
and collation_connection
system variables are used.
Examples:
A string with latin1
character set and latin1_german1_ci
collation:
SELECT _latin1'Müller' COLLATE latin1_german1_ci;
A string with latin1
character set and its default
collation (that is, latin1_swedish_ci
):
SELECT _latin1'Müller';
A string with the connection default character set and collation:
SELECT 'Müller';
Character set introducers and the COLLATE
clause are implemented according to
standard SQL specifications.
An introducer indicates the character set for the following string, but does not change now how the parser
performs escape processing within the string. Escapes are always interpreted by the parser according to the
character set given by character_set_connection
.
The following examples show that escape processing occurs using character_set_connection
even in the presence of an introducer. The examples use
SET NAMES
(which changes character_set_connection
, as discussed in Section
10.1.4, "Connection Character Sets and Collations"), and display the resulting strings using the HEX()
function so that
the exact string contents can be seen.
Example 1:
mysql>SET NAMES latin1;
Query OK, 0 rows affected (0.01 sec)mysql>SELECT HEX('à\n'), HEX(_sjis'à\n');
+------------+-----------------+| HEX('à\n') | HEX(_sjis'à\n') |+------------+-----------------+| E00A | E00A |+------------+-----------------+1 row in set (0.00 sec)
Here, "à
" (hex value E0
) is followed by "\n
", the escape sequence for newline. The escape sequence is
interpreted using the character_set_connection
value of latin1
to
produce a literal newline (hex value 0A
). This happens even for the second string.
That is, the introducer of _sjis
does not affect the parser's escape processing.
Example 2:
mysql>SET NAMES sjis;
Query OK, 0 rows affected (0.00 sec)mysql>SELECT HEX('à\n'), HEX(_latin1'à\n');
+------------+-------------------+| HEX('à\n') | HEX(_latin1'à\n') |+------------+-------------------+| E05C6E | E05C6E |+------------+-------------------+1 row in set (0.04 sec)
Here, character_set_connection
is sjis
, a character set in which the sequence of "à
" followed by "\
" (hex values 05
and 5C
) is a valid multi-byte character. Hence, the first two bytes of the string
are interpreted as a single sjis
character, and the "\
" is not interpreted as an escape character.
The following "n
" (hex value
6E
) is not interpreted as part of an escape sequence. This is true even for the
second string; the introducer of _latin1
does not affect escape processing.