Spec-Zone .ru
спецификации, руководства, описания, API
|
This section describes how InnoDB
tables, indexes, and their associated metadata is
represented at the physical level. This information is primarily useful for performance tuning and
troubleshooting.
MySQL stores its data dictionary information for tables in .frm files in database
directories. Unlike other MySQL storage engines, InnoDB
also encodes
information about the table in its own internal data dictionary inside the tablespace. When MySQL drops a
table or a database, it deletes one or more .frm
files as well as the
corresponding entries inside the InnoDB
data dictionary. You cannot move InnoDB
tables between databases simply by moving the .frm
files.
Every InnoDB
table has a special index called the clustered
index where the data for the rows is stored. Typically, the clustered index is synonymous with the primary key. To get the best
performance from queries, inserts, and other database operations, you must understand how InnoDB uses the
clustered index to optimize the most common lookup and DML operations for each table.
When you define a PRIMARY KEY
on your table, InnoDB
uses it as the clustered index. Define a primary key for each
table that you create. If there is no logical unique and non-null column or set of columns, add a
new auto-increment column, whose values are filled in automatically.
If you do not define a PRIMARY KEY
for your table,
MySQL locates the first UNIQUE
index where all the key columns are
NOT NULL
and InnoDB
uses it as the
clustered index.
If the table has no PRIMARY KEY
or suitable UNIQUE
index, InnoDB
internally generates a
hidden clustered index on a synthetic column containing row ID values. The rows are ordered by the
ID that InnoDB
assigns to the rows in such a table. The row ID is a
6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the
row ID are physically in insertion order.
Accessing a row through the clustered index is fast because the index search leads directly to the page with
all the row data. If a table is large, the clustered index architecture often saves a disk I/O operation
when compared to storage organizations that store row data using a different page from the index record.
(For example, MyISAM
uses one file for data rows and another for index
records.)
All indexes other than the clustered index are known as secondary indexes. In
InnoDB
, each record in a secondary index contains the primary key columns for
the row, as well as the columns specified for the secondary index. InnoDB
uses
this primary key value to search for the row in the clustered index.
If the primary key is long, the secondary indexes use more space, so it is advantageous to have a short primary key.
For coding guidelines to take advantage of InnoDB
clustered and secondary
indexes, see Section 8.3.2, "Using Primary Keys" Section
8.3, "Optimization and Indexes" Section 8.5,
"Optimizing for InnoDB
Tables" Section
8.3.2, "Using Primary Keys".
A special kind of index, the FULLTEXT
index, helps InnoDB
deal with queries and DML operations involving text-based columns and the
words they contain. These indexes are physically represented as entire InnoDB
tables, which are acted upon by SQL keywords such as the FULLTEXT
clause of the
CREATE INDEX
statement, the MATCH() ... AGAINST
syntax in a SELECT
statement, and the OPTIMIZE TABLE
statement. For usage information, see Section
12.9, "Full-Text Search Functions".
You can examine FULLTEXT
indexes by querying tables in the INFORMATION_SCHEMA
database. You can see basic index information for FULLTEXT
indexes by querying INNODB_SYS_INDEXES
. Although InnoDB
FULLTEXT
indexes are represented by tables, which show up in INNODB_SYS_TABLES
queries, the way to monitor the special text-processing
aspects of a FULLTEXT
index is to query the tables INNODB_FT_CONFIG
, INNODB_FT_INDEX_TABLE
, INNODB_FT_INDEX_CACHE
, INNODB_FT_DEFAULT_STOPWORD
, INNODB_FT_DELETED
, and INNODB_FT_BEING_DELETED
.
InnoDB
FULLTEXT
indexes are updated by the OPTIMIZE TABLE
command, using a special mode controlled by the configuration
options innodb_ft_num_word_optimize
and innodb_optimize_fulltext_only
.
All InnoDB
indexes are B-trees where the index records
are stored in the leaf pages of the tree. The default size of an index page is 16KB. When new records are
inserted, InnoDB
tries to leave 1/16 of the page free for future insertions and
updates of the index records.
If index records are inserted in a sequential order (ascending or descending), the resulting index pages are
about 15/16 full. If records are inserted in a random order, the pages are from 1/2 to 15/16 full. If the fill factor of an index
page drops below 1/2, InnoDB
tries to contract the index tree to free the page.
You can specify the page
size for all InnoDB
tablespaces in a MySQL instance by setting the innodb_page_size
configuration option before creating the instance. Once the page size for a MySQL instance is set, you
cannot change it. Supported sizes are 16KB, 8KB, and 4KB, corresponding to the option values 16k
, 8k
, and 4k
.
A MySQL instance using a particular InnoDB
page size cannot use
data files or log files from an instance that uses a different page size.
Database applications often insert new rows in the ascending order of the primary key. In this case, due to the layout of the clustered index in the same order as the primary key, insertions into an InnoDB table do not require random reads from a disk.
On the other hand, secondary indexes are usually nonunique, and insertions into secondary indexes happen in
a relatively random order. In the same way, deletes and updates can affect data pages that are not adjacent
in secondary indexes. This would cause a lot of random disk I/O operations without a special mechanism used
in InnoDB
.
When an index record is inserted, marked for deletion, or deleted from a nonunique secondary index, InnoDB
checks whether the secondary index page is in the buffer
pool. If that is the case, InnoDB
applies the change directly to the index
page. If the index page is not found in the buffer pool, InnoDB
records the
change in a special structure known as the insert buffer. The insert buffer is
kept small so that it fits entirely in the buffer pool, and changes can be applied very quickly. This
process is known as change
buffering. (Formerly, it applied only to inserts and was called insert buffering. The data structure
is still called the insert buffer.)
Periodically, the insert buffer is merged into the secondary index trees in the database. Often, it is possible to merge several changes into the same page of the index tree, saving disk I/O operations. It has been measured that the insert buffer can speed up insertions into a table up to 15 times.
The insert buffer merging may continue to happen after the
transaction has been committed. In fact, it may continue to happen after a server shutdown and restart (see
Section 14.2.4.6, "Starting InnoDB
on a Corrupted Database").
Insert buffer merging may take many hours when many secondary indexes must be updated and many rows have
been inserted. During this time, disk I/O will be increased, which can cause significant slowdown on
disk-bound queries. Another significant background I/O operation is the purge
thread (see Section 14.2.3.11, "InnoDB
Multi-Versioning").
The feature known as the adaptive hash index (AHI) lets InnoDB
perform more like an in-memory database on systems with appropriate
combinations of workload and ample memory for the buffer pool, without
sacrificing any transactional features or reliability. This feature is enabled by the innodb_adaptive_hash_index
option, or turned off by the --skip-innodb_adaptive_hash_index
at server startup.
Based on the observed pattern of searches, MySQL builds a hash index using a prefix of the index key. The prefix of the key can be any length, and it may be that only some of the values in the B-tree appear in the hash index. Hash indexes are built on demand for those pages of the index that are often accessed.
If a table fits almost entirely in main memory, a hash index can speed up queries by enabling direct lookup
of any element, turning the index value into a sort of pointer. InnoDB
has a
mechanism that monitors index searches. If InnoDB
notices that queries could
benefit from building a hash index, it does so automatically.
With some workloads, the speedup
from hash index lookups greatly outweighs the extra work to monitor index lookups and maintain the hash
index structure. Sometimes, the read/write lock that guards access to the adaptive hash index can become a
source of contention under heavy workloads, such as multiple concurrent joins. Queries with LIKE
operators and %
wildcards also tend not to
benefit from the AHI. For workloads where the adaptive hash index is not needed, turning it off reduces
unnecessary performance overhead. Because it is difficult to predict in advance whether this feature is
appropriate for a particular system, consider running benchmarks with it both enabled and disabled, using a
realistic workload. The architectural changes in MySQL 5.6 and higher make more workloads suitable for
disabling the adaptive hash index than in earlier releases, although it is still enabled by default.
The hash index is always built based on an existing B-tree index on the table. InnoDB
can build a hash index on a prefix of any length of the key defined for
the B-tree, depending on the pattern of searches that InnoDB
observes for the
B-tree index. A hash index can be partial, covering only those pages of the index that are often accessed.
You can monitor the use of the adaptive hash index and the contention for its use in the SEMAPHORES
section of the output of the SHOW ENGINE INNODB STATUS
command. If you see many threads waiting on an
RW-latch created in btr0sea.c
, then it might be useful to disable adaptive
hash indexing.
For more information about the performance characteristics of hash indexes, see Section 8.3.8, "Comparison of B-Tree and Hash Indexes".
The physical row structure for an InnoDB
table depends on the row format
specified when the table was created. By default, InnoDB
uses the Antelope file format and its
COMPACT
row format. The REDUNDANT
format is
available to retain compatibility with older versions of MySQL. When you enable the innodb_file_per_table
setting, you can also make use of the newer
Barracuda file format, with its DYNAMIC
and COMPRESSED
row formats, as explained in Section
5.4.8, "How InnoDB
Stores Variable-Length Columns" and Section
5.4.6, "Working with InnoDB
Compressed Tables".
To check the row format of an InnoDB
table, use SHOW TABLE STATUS
.
The COMPACT
row format decreases row storage space by about 20% at the cost of
increasing CPU use for some operations. If your workload is a typical one that is limited by cache hit rates
and disk speed, COMPACT
format is likely to be faster. If the workload is a
rare case that is limited by CPU speed, COMPACT
format might be slower.
Rows in InnoDB
tables that use REDUNDANT
row
format have the following characteristics:
Each index record contains a 6-byte header. The header is used to link together consecutive records, and also in row-level locking.
Records in the clustered index contain fields for all user-defined columns. In addition, there is a 6-byte transaction ID field and a 7-byte roll pointer field.
If no primary key was defined for a table, each clustered index record also contains a 6-byte row ID field.
Each secondary index record also contains all the primary key fields defined for the clustered index key that are not in the secondary index.
A record contains a pointer to each field of the record. If the total length of the fields in a record is less than 128 bytes, the pointer is one byte; otherwise, two bytes. The array of these pointers is called the record directory. The area where these pointers point is called the data part of the record.
Internally, InnoDB
stores fixed-length character
columns such as CHAR(10)
in a fixed-length format. InnoDB
does not truncate trailing spaces from
VARCHAR
columns.
An SQL NULL
value reserves one or two bytes in the
record directory. Besides that, an SQL NULL
value reserves zero bytes
in the data part of the record if stored in a variable length column. In a fixed-length column, it
reserves the fixed length of the column in the data part of the record. Reserving the fixed space
for NULL
values enables an update of the column from NULL
to a non-NULL
value to be done in
place without causing fragmentation of the index page.
Rows in InnoDB
tables that use COMPACT
row format
have the following characteristics:
Each index record contains a 5-byte header that may be preceded by a variable-length header. The header is used to link together consecutive records, and also in row-level locking.
The variable-length part of the record header contains a bit vector for
indicating NULL
columns. If the number of columns in the index that can
be NULL
is N
, the bit vector
occupies CEILING(
bytes.
(For example, if there are anywhere from 9 to 15 columns that can be N
/8)NULL
, the bit vector uses two bytes.) Columns that are NULL
do not occupy space other than the bit in this vector. The
variable-length part of the header also contains the lengths of variable-length columns. Each length
takes one or two bytes, depending on the maximum length of the column. If all columns in the index
are NOT NULL
and have a fixed length, the record header has no
variable-length part.
For each non-NULL
variable-length field, the
record header contains the length of the column in one or two bytes. Two bytes will only be needed
if part of the column is stored externally in overflow pages or the maximum length exceeds 255 bytes
and the actual length exceeds 127 bytes. For an externally stored column, the 2-byte length
indicates the length of the internally stored part plus the 20-byte pointer to the externally stored
part. The internal part is 768 bytes, so the length is 768+20. The 20-byte pointer stores the true
length of the column.
The record header is followed by the data contents of the non-NULL
columns.
Records in the clustered index contain fields for all user-defined columns. In addition, there is a 6-byte transaction ID field and a 7-byte roll pointer field.
If no primary key was defined for a table, each clustered index record also contains a 6-byte row ID field.
Each secondary index record also contains all the primary key fields defined for the clustered index key that are not in the secondary index. If any of these primary key fields are variable length, the record header for each secondary index will have a variable-length part to record their lengths, even if the secondary index is defined on fixed-length columns.
Internally, InnoDB
stores fixed-length,
fixed-width character columns such as CHAR(10)
in a fixed-length format. InnoDB
does not truncate trailing spaces from VARCHAR
columns.
Internally, InnoDB
attempts to store UTF-8 CHAR(
columns in N
)N
bytes by trimming trailing spaces. (With REDUNDANT
row format, such columns occupy 3 × N
bytes.) Reserving the minimum space N
in many cases enables column updates to be done in place
without causing fragmentation of the index page.