Spec-Zone .ru
спецификации, руководства, описания, API
|
This section discusses the procedure for adding a character set to MySQL. The proper procedure depends on whether the character set is simple or complex:
If the character set does not need special string collating routines for sorting and does not need multi-byte character support, it is simple.
If the character set needs either of those features, it is complex.
For example, greek
and swe7
are simple character sets,
whereas big5
and czech
are complex character sets.
To use the following instructions, you must have a MySQL source distribution. In the instructions, MYSET
represents the name of the character set that you want to add.
Add a <charset>
element for MYSET
to the sql/share/charsets/Index.xml
file. Use the existing contents in the file as a guide to adding new contents. A partial listing for the
latin1
<charset>
element follows:
<charset name="latin1"> <family>Western</family> <description>cp1252 West European</description> ... <collation name="latin1_swedish_ci" id="8" order="Finnish, Swedish"> <flag>primary</flag> <flag>compiled</flag> </collation> <collation name="latin1_danish_ci" id="15" order="Danish"/> ... <collation name="latin1_bin" id="47" order="Binary"> <flag>binary</flag> <flag>compiled</flag> </collation> ...</charset>
The <charset>
element must list all the collations for the
character set. These must include at least a binary collation and a default (primary) collation. The
default collation is often named using a suffix of general_ci
(general,
case insensitive). It is possible for the binary collation to be the default collation, but usually
they are different. The default collation should have a primary
flag.
The binary collation should have a binary
flag.
You must assign a unique ID number to each collation. The range of IDs from 1024 to 2047 is reserved for user-defined collations. To find the maximum of the currently used collation IDs, use this query:
SELECT MAX(ID) FROM INFORMATION_SCHEMA.COLLATIONS;
This step depends on whether you are adding a simple or complex character set. A simple character set requires only a configuration file, whereas a complex character set requires C source file that defines collation functions, multi-byte functions, or both.
For a simple character set, create a configuration file,
, that describes the character set
properties. Create this file in the MYSET
.xmlsql/share/charsets
directory. You
can use a copy of latin1.xml
as the basis for this file. The syntax
for the file is very simple:
Comments are written as ordinary XML comments (
<!--
). text
-->
Words within <map>
array elements
are separated by arbitrary amounts of whitespace.
Each word within <map>
array
elements must be a number in hexadecimal format.
The <map>
array element for the
<ctype>
element has 257 words. The other <map>
array elements after that have 256 words. See Section 10.3.1, "Character Definition
Arrays".
For each collation listed in the
<charset>
element for the character set in Index.xml
,
must contain a MYSET
.xml<collation>
element that defines the
character ordering.
For a complex character set, create a C source file that describes the character set properties and defines the support routines necessary to properly perform operations on the character set:
Create the file ctype-
in the MYSET
.cstrings
directory. Look at one of the existing ctype-*.c
files (such
as ctype-big5.c
) to see what needs to be defined. The arrays
in your file must have names like ctype_
,
MYSET
to_lower_
, and
so on. These correspond to the arrays for a simple character set. See Section
10.3.1, "Character Definition Arrays". MYSET
For each <collation>
element listed
in the <charset>
element for the character set in Index.xml
, the ctype-
file must provide an
implementation of the collation. MYSET
.c
If the character set requires string collating functions, see Section 10.3.2, "String Collating Support for Complex Character Sets".
If the character set requires multi-byte character support, see Section 10.3.3, "Multi-Byte Character Support for Complex Character Sets".
Modify the configuration information. Use the existing configuration information as
a guide to adding information for MYSYS
. The example here
assumes that the character set has default and binary collations, but more lines are needed if MYSET
has additional collations.
Edit mysys/charset-def.c
, and "register" the collations for the new
character set.
Add these lines to the "declaration" section:
#ifdef HAVE_CHARSET_MYSET
extern CHARSET_INFO my_charset_MYSET
_general_ci;extern CHARSET_INFO my_charset_MYSET
_bin;#endif
Add these lines to the "registration" section:
#ifdef HAVE_CHARSET_MYSET
add_compiled_collation(&my_charset_MYSET
_general_ci); add_compiled_collation(&my_charset_MYSET
_bin);#endif
If the character set uses ctype-
, edit MYSET
.cstrings/CMakeLists.txt
and add ctype-
to the definition of the MYSET
.cSTRINGS_SOURCES
variable.
Edit cmake/character_sets.cmake
:
Add MYSET
to the
value of with CHARSETS_AVAILABLE
in alphabetic
order.
Add MYSET
to the
value of CHARSETS_COMPLEX
in alphabetic order. This
is needed even for simple character sets, or CMake will not recognize -DDEFAULT_CHARSET=
.MYSET
Reconfigure, recompile, and test.