Spec-Zone .ru
спецификации, руководства, описания, API

12.9.7. Adding a Collation for Full-Text Indexing

This section describes how to add a new collation for full-text searches. The sample collation is like latin1_swedish_ci but treats the '-' character as a letter rather than as a punctuation character so that it can be indexed as a word character. General information about adding collations is given in Section 10.4, "Adding a Collation to a Character Set"; it is assumed that you have read it and are familiar with the files involved.

To add a collation for full-text indexing, use this procedure:

  1. Add a collation to the Index.xml file. The collation ID must be unused, so choose a value different from 1000 if that ID is already taken on your system.

    <charset name="latin1">...<collation name="latin1_fulltext_ci" id="1000"/></charset>
  2. Declare the sort order for the collation in the latin1.xml file. In this case, the order can be copied from latin1_swedish_ci:

    <collation name="latin1_fulltext_ci"><map>00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9FA0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AFB0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF41 41 41 41 5C 5B 5C 43 45 45 45 45 49 49 49 4944 4E 4F 4F 4F 4F 5D D7 D8 55 55 55 59 59 DE DF41 41 41 41 5C 5B 5C 43 45 45 45 45 49 49 49 4944 4E 4F 4F 4F 4F 5D F7 D8 55 55 55 59 59 DE FF</map></collation>
  3. Modify the ctype array in latin1.xml. Change the value corresponding to 0x2D (which is the code for the '-' character) from 10 (punctuation) to 01 (small letter). In the following array, this is the element in the fourth row down, third value from the end.

    <ctype><map>0020 20 20 20 20 20 20 20 20 28 28 28 28 28 20 2020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 2048 10 10 10 10 10 10 10 10 10 10 10 10 01 10 1084 84 84 84 84 84 84 84 84 84 10 10 10 10 10 1010 81 81 81 81 81 81 01 01 01 01 01 01 01 01 0101 01 01 01 01 01 01 01 01 01 01 10 10 10 10 1010 82 82 82 82 82 82 02 02 02 02 02 02 02 02 0202 02 02 02 02 02 02 02 02 02 02 10 10 10 10 2010 00 10 02 10 10 10 10 10 10 01 10 01 00 01 0000 10 10 10 10 10 10 10 10 10 02 10 02 00 02 0148 10 10 10 10 10 10 10 10 10 10 10 10 10 10 1010 10 10 10 10 10 10 10 10 10 10 10 10 10 10 1001 01 01 01 01 01 01 01 01 01 01 01 01 01 01 0101 01 01 01 01 01 01 10 01 01 01 01 01 01 01 0202 02 02 02 02 02 02 02 02 02 02 02 02 02 02 0202 02 02 02 02 02 02 10 02 02 02 02 02 02 02 02</map></ctype>
  4. Restart the server.

  5. To employ the new collation, include it in the definition of columns that are to use it:

    mysql> DROP TABLE IF EXISTS
                        t1;Query OK, 0 rows affected (0.13 sec)mysql> CREATE
                        TABLE t1 (    a TEXT CHARACTER SET latin1 COLLATE
                        latin1_fulltext_ci,    FULLTEXT
                        INDEX(a)    ) ENGINE=InnoDB;Query OK, 0 rows affected (0.47 sec)
  6. Test the collation to verify that hyphen is considered as a word character:

    mysql> INSERT INTO t1 VALUEs
                        ('----'),('....'),('abcd');Query OK, 3 rows affected (0.22 sec)Records: 3  Duplicates: 0  Warnings: 0mysql> SELECT * FROM t1 WHERE MATCH a AGAINST ('----' IN BOOLEAN MODE);+------+| a    |+------+| ---- |+------+1 row in set (0.00 sec)