Spec-Zone .ru
спецификации, руководства, описания, API
|
This section summarizes the major InnoDB features and enhancements for performance and scalability. This information is useful to any DBA or developer who is concerned with performance and scalability. Although some of the enhancements do not require any action on your part, knowing this information can still help you diagnose performance issues more quickly and modernize systems and applications that rely on older, inefficient behavior.
InnoDB has always been highly efficient, and includes several unique architectural elements to assure high performance and scalability. The latest InnoDB storage engine includes new features that take advantage of advances in operating systems and hardware platforms, such as multi-core processors and improved memory allocation systems. In addition, new configuration options let you better control some InnoDB internal subsystems to achieve the best performance with your workload.
Starting with MySQL 5.5 and InnoDB 1.1, the built-in InnoDB storage engine within MySQL is upgraded to the
full feature set and performance of the former InnoDB Plugin. This change makes these performance and
scalability enhancements available to a much wider audience than before, and eliminates the separate
installation step of the InnoDB Plugin. After learning about the InnoDB performance features in this
section, continue with Chapter 8,
Optimization to learn the best practices for overall MySQL performance, and Section
8.5, "Optimizing for InnoDB
Tables" in particular for InnoDB tips and
guidelines.
Traditionally, the InnoDB
compression feature was
recommended primarily for read-only or read-mostly workloads, such as in a data warehouse
configuration. The rise of SSD storage
devices, which are fast but relatively small and expensive, makes compression attractive also for OLTP
workloads: high-traffic, interactive web sites can reduce their storage
requirements and their I/O operations per second (IOPS) by using compressed tables with
applications that do frequent INSERT
, UPDATE
,
and DELETE
operations.
New configuration options in MySQL 5.6 let you adjust the way compression works for a particular MySQL instance, with an emphasis on performance and scalability for write-intensive operations:
innodb_compression_level
lets you turn the degree of compression up
or down. A higher value lets you fit more data onto a storage device, at the expense of more CPU
overhead during compression. A lower value lets you reduce CPU overhead when storage space is not
critical, or you expect the data is not especially compressible.
innodb_compression_failure_threshold_pct
specifies a cutoff point for
compression
failures during updates to a compressed table. When this threshold is passed, MySQL begins
to leave additional free space within each new compressed page, dynamically adjusting the amount of
free space up to the percentage of page size specified by innodb_compression_pad_pct_max
innodb_compression_pad_pct_max
lets you adjust the maximum amount of
space reserved within each page to
record changes to compressed rows, without needing to compress the entire page again. The higher the
value, the more changes can be recorded without recompressing the page. MySQL uses a variable amount
of free space for the pages within each compressed table, only when a designated percentage of
compression operations "fail"
at runtime, requiring an expensive operation to split the compressed page.
Because working with compressed data sometimes involves keeping both compressed and uncompressed versions of
a page in memory at the same time, when using compression with an OLTP-style workload, be prepared to
increase the value of the innodb_buffer_pool_size
configuration option.
For more information on MySQL data compression, see Section
5.4.6, "Working with InnoDB
Compressed Tables". For the performance
aspects, especially see the section Section
5.4.6.3, "Tuning Compression for InnoDB Tables".
When a transaction is known to be read-only, InnoDB
can avoid the overhead
associated with setting up the transaction ID (TRX_ID
field). The transaction ID is only needed for a transaction
that might perform write operations or locking reads such as SELECT ... FOR UPDATE
. Eliminating these unnecessary transaction IDs reduces
the size of internal data structures that are consulted each time a query or DML statement constructs a read view.
Currently, InnoDB
detects the read-only nature of the transaction and applies
this optimization when any of the following conditions are met:
The transaction is started with the START TRANSACTION READ ONLY
statement. In this case, attempting to
make any changes to the database (for InnoDB
, MyISAM
,
or other types of tables) causes an error, and the transaction continues in read-only state:
ERROR 1792 (25006): Cannot execute statement in a READ ONLY transaction.
You can still make changes to session-specific temporary tables in a read-only transaction, or issue locking queries for them, because those changes and locks are not visible to any other transaction.
The autocommit
setting is turned on, so that the transaction is guaranteed to be a single statement, and the single
statement making up the transaction is a "non-locking"
SELECT
statement. That is, a SELECT
that does not use a FOR
UPDATE
or LOCK IN SHARED MODE
clause.
Thus, for a read-intensive application such as a report generator, you can tune a sequence of InnoDB
queries by grouping them inside START TRANSACTION
READ ONLY
and COMMIT
, or by turning on the autocommit
setting before running the SELECT
statements, or simply by avoiding any DML statements interspersed with the
queries.
Transactions that qualify as auto-commit, non-locking, and read-only (AC-NL-RO) are kept out of
certain internal InnoDB
data structures and are therefore not listed in SHOW ENGINE INNODB STATUS
output. These transactions are only visible
in the Information Schema.
This feature optionally moves the InnoDB
undo log out
of the system
tablespace into one or more separate tablespaces. The I/O patterns for the
undo log make these new tablespaces good candidates to move to SSD storage, while keeping
the system tablespace on hard disk storage. Users cannot drop the separate tablespaces created to hold InnoDB
undo logs, or the individual segments inside
those tablespaces.
Because these files handle I/O operations formerly done inside the system tablespace, we broaden the definition of system tablespace to include these new files.
The undo logs are also known as the rollback segments.
This feature involves the following new or renamed configuration options:
innodb_rollback_segments
becomes innodb_undo_logs
. The old name is still available for compatibility.
To use this feature, follow these steps:
Decide on a path on a fast storage device to hold the undo logs. You will
specify that path as the argument to the innodb_undo_directory
option in your MySQL configuration file or
startup script.
Decide on a non-zero starting value for the innodb_undo_logs
option. You can start with a relatively low value
and increase it over time to examine the effect on performance.
Decide on a non-zero value for the innodb_undo_tablespaces
option. The multiple undo logs specified by
the innodb_undo_logs
value are divided between this many separate
tablespaces (represented by .ibd
files). This
value is fixed for the life of the MySQL instance, so if you are uncertain about the optimal value,
estimate on the high side.
Set up an entirely new MySQL instance for testing, using the values you chose in the configuration file or in your MySQL startup script. Use a realistic workload with data volume similar to your production servers.
Benchmark the performance of I/O intensive workloads.
Periodically increase the value of innodb_undo_logs
and re-do the performance tests. Find the value where
you stop experiencing gains in I/O performance.
Deploy a new production instance using the ideal settings for these options. Set it up as a slave server in a replication configuration, or transfer data from an earlier production instance.
Keeping the undo logs in separate files allows the MySQL team to implement I/O and memory optimizations
related to this transactional data. For example, because the undo data is written to disk and then rarely
used (only in case of crash recovery), it does not need to be kept in the filesystem memory cache, in turn
allowing a higher percentage of system memory to be devoted to the InnoDB
buffer pool.
The typical SSD best practice of keeping the InnoDB
system tablespace on a hard
drive and moving the per-table tablespaces to SSD, is assisted by moving the undo information into separate
tablespace files.
The physical tablespace files are named undo
, where N
N
is the
space ID, including leading zeros.
Currently, MySQL instances containing separate undo tablespaces cannot be downgraded to earlier releases such as MySQL 5.5 or 5.1.
The benefits of the InnoDB
file-per-table setting
come with the tradeoff that each .ibd
file is
extended as the data inside the table grows. This I/O operation can be a bottleneck for busy systems with
many InnoDB
tables. When all InnoDB
tables are
stored inside the system
tablespace, this extension operation happens less frequently, as space freed by DELETE
or TRUNCATE
operations within one table
can be reused by another table.
MySQL 5.6 improves the concurrency of the extension operation, so that multiple .ibd
files can be extended simultaneously, and this operation does not block
read or write operations performed by other threads.
The code that detects deadlocks in
InnoDB
transactions has been modified to use a
fixed-size work area rather than a recursive algorithm. The resulting detection operation is faster as a
result. You do not need to do anything to take advantage of this enhancement.
Under both the old and new detection mechanisms, you might encounter a search too
deep
error that is not a true deadlock, but requires you to re-try the transaction the same way as
with a deadlock.
You can enable the configuration option innodb_checksum_algorithm=crc32
configuration setting to change the checksum algorithm to a faster one
that scans the block 32 bits at a time rather than 8 bits at a time. When the CRC32 algorithm is enabled,
data blocks that are written to disk by InnoDB
contain different values in
their checksum fields than before. This process could be gradual, with a mix of old and new checksum values
within the same table or database.
For maximum downward compatibility, this setting is off by default:
Current versions of MySQL Enterprise Backup (up to 3.8.0) do not support backing up tablespaces that use crc32 checksums.
.ibd
files containing crc32 checksums could cause
problems downgrading to MySQL versions prior to 5.6.3. MySQL 5.6.3 and up recognizes either the new
or old checksum values for the block as correct when reading the block from disk, ensuring that data
blocks are compatible during upgrade and downgrade regardless of the algorithm setting. If data
written with new checksum values is processed by an level of MySQL earlier than 5.6.3, it could be
reported as corrupted.
When you set up a new MySQL instance, and can be sure that all the InnoDB
data
is created using the CRC32 checksum algorithm, you can use the setting innodb_checksum_algorithm=strict_crc32
, which can be faster than the
crc32
setting because it does not do the extra checksum calculations to support
both old and new values.
The innodb_checksum_algorithm
option has other values that allow it to replace the innodb_checksums
option. innodb_checksum_algorithm=none
is the same as innodb_checksums=OFF
. innodb_checksum_algorithm=innodb
is the same as innodb_checksums=ON
. To avoid conflicts, remove references to
innodb_checksums
from your configuration file and MySQL startup scripts. The
new option also accepts values strict_none
and strict_innodb
,
again offering better performance in situations where all InnoDB
data in an
instance is created with the same checksum algorithm.
The following table illustrates the difference between the none
, innodb
, and crc32
option values, and their strict_
counterparts. none
, innodb
,
and crc32
write the specified type checksum value into each data block, but for
compatibility accept any of the other checksum values when verifying a block during a read operation. The
strict_
form of each parameter only recognizes one kind of checksum, which
makes verification faster but requires that all InnoDB
data files in an
instance be created under the identical innodb_checksum_algorithm
value.
Table 14.3. Allowed Settings for innodb_checksum_algorithm
Value | Generated checksum (when writing) | Allowed checksums (when reading) |
---|---|---|
none | A constant number. | Any of the checksums generated by none , innodb ,
or crc32 .
|
innodb | A checksum calculated in software, using the original algorithm from InnoDB .
|
Any of the checksums generated by none , innodb ,
or crc32 .
|
crc32 | A checksum calculated using the crc32 algorithm, possibly done
with a hardware assist.
|
Any of the checksums generated by none , innodb ,
or crc32 .
|
strict_none | A constant number | Only the checksum generated by none . |
strict_innodb | A checksum calculated in software, using the original algorithm from InnoDB .
|
Only the checksum generated by innodb . |
strict_crc32 | A checksum calculated using the crc32 algorithm,possibly done
with a hardware assist.
|
Only the checksum generated by crc32 . |
After you restart a busy server, there is typically a warmup period with steadily
increasing throughput, as disk pages that were in the InnoDB
buffer
pool are brought back into memory as the same data is queried, updated, and so on. Once the buffer
pool holds a similar set of pages as before the restart, many operations are performed in memory rather than
involving disk I/O, and throughput stabilizes at a high level.
This feature shortens the warmup period by immediately reloading disk pages that were in the buffer pool before the restart, rather than waiting for DML operations to access the corresponding rows. The I/O requests can be performed in large batches, making the overall I/O faster. The page loading happens in the background, and does not delay the database startup.
In addition to saving the buffer pool state at shutdown and restoring it at startup, you can also save or restore the state at any time. For example, you might save the state of the buffer pool after reaching a stable throughput under a steady workload. You might restore the previous buffer pool state after running reports or maintenance jobs that bring data pages into the buffer pool that are only needed during the time period for those operations, or after some other period with a non-typical workload.
Although the buffer pool itself could be many gigabytes in size, the data that InnoDB
saves on disk to restore the buffer pool is tiny by comparison: just the
tablespace and page IDs necessary to locate the appropriate pages on disk. This information is derived from
the information_schema
table innodb_buffer_page_lru
.
Because the data is cached in and aged out of the buffer pool the same as with regular database operations, there is no problem if the disk pages were updated recently, or if a DML operation involves data that has not yet been loaded. The loading mechanism skips any requested pages that no longer exist.
This feature involves the configuration variables:
and the status variables:
To save the current state of the InnoDB
buffer pool, issue the statement:
SET innodb_buffer_pool_dump_now=ON;
The underlying mechanism involves a background thread that is dispatched to perform the dump and load operations.
By default, the buffer pool state is saved in a file ib_buffer_pool
in the
InnoDB
data directory.
Disk pages from compressed tables are loaded into the buffer pool in their compressed form. Uncompression happens as usual when the page contents are accessed in the course of DML operations. Because decompression is a CPU-intensive process, it is more efficient for concurrency to perform that operation in one of the connection threads rather than the single thread that performs the buffer pool restore operation.
Example 14.1. Examples of Dumping and Restoring the InnoDB Buffer Pool
Trigger a dump of the buffer pool manually:
SET innodb_buffer_pool_dump_now=ON;
Specify that a dump should be taken at shutdown:
SET innodb_buffer_pool_dump_at_shutdown=ON;
Specify that a dump should be loaded at startup:
SET innodb_buffer_pool_load_at_startup=ON;
Trigger a load of the buffer pool manually:
SET innodb_buffer_pool_load_now=ON;
Specify which filename to use for storing the dump to and loading the dump from:
SET innodb_buffer_pool_filename='filename';
Display progress of dump:
SHOW STATUS LIKE 'innodb_buffer_pool_dump_status';
or:
SELECT variable_value FROM information_schema.global_status WHEREvariable_name = 'INNODB_BUFFER_POOL_DUMP_STATUS';
Outputs any of: not started, Dumping buffer pool 5/7, page 237/2873, Finished at 110505 12:18:02
Display progress of load:
SHOW STATUS LIKE 'innodb_buffer_pool_load_status';
or:
SELECT variable_value FROM information_schema.global_status WHEREvariable_name = 'INNODB_BUFFER_POOL_LOAD_STATUS';
Outputs any of: not started, Loaded 123/22301 pages, Finished at 110505 12:23:24
Abort a buffer pool load:
SET innodb_buffer_pool_load_abort=ON;
The new configuration options innodb_flush_neighbors
and innodb_lru_scan_depth
let you fine-tune certain aspects of the flushing process for the InnoDB
buffer
pool. These options primarily help write-intensive workloads. With heavy DML activity, flushing can fall behind if it
is not aggressive enough, resulting in excessive memory use in the buffer pool; or, disk writes due to
flushing can saturate your I/O capacity if that mechanism is too aggressive. The ideal settings depend on
your workload, data access patterns, and storage configuration (for example, whether data is stored on HDD or SSD devices).
For systems with constant heavy workloads,
or workloads that fluctuate widely, several new configuration options let you fine-tune the flushing
behavior for InnoDB
tables: innodb_adaptive_flushing_lwm
, innodb_max_dirty_pages_pct_lwm
, innodb_io_capacity_max
, and innodb_flushing_avg_loops
. These options feed into an improved formula
used by the innodb_adaptive_flushing
option.
The existing innodb_adaptive_flushing
, innodb_io_capacity
and innodb_max_dirty_pages_pct
options work as before, except that they are
limited or extended by other options: innodb_adaptive_flushing_lwm
, innodb_io_capacity_max
and innodb_max_dirty_pages_pct_lwm
:
The InnoDB
adaptive
flushing mechanism is not appropriate in all cases. It gives the most benefit when the redo log is in danger of
filling up. The innodb_adaptive_flushing_lwm
option specifies a percentage of
redo log capacity; when that threshold is crossed, InnoDB
turns on
adaptive flushing even if not specified by the innodb_adaptive_flushing
option.
If flushing activity falls far behind, InnoDB
can
flush more aggressively than specified by innodb_io_capacity
. innodb_io_capacity_max
represents an upper limit on the I/O
capacity used in such emergency situations, so that the spike in I/O does not consume all the
capacity of the server.
When the innodb_max_dirty_pages_pct
threshold is crossed, InnoDB
can begin aggressively flushing pages to disk. The innodb_max_dirty_pages_pct_lwm
option specifies a higher value at
which InnoDB
begins gently flushing pages, ideally preventing the
percentage of dirty pages from reaching innodb_max_dirty_pages_pct
. A value of innodb_max_dirty_pages_pct_lwm=0
disables this "preflushing" behavior.
All of these options are most applicable for servers running heavy workloads for long periods of time, when
there is rarely enough idle time to catch up with changes waiting to be written to disk. The innodb_flushing_avg_loops
lets you distinguish between a server that is
running at full capacity 24x7 and one that experiences periodic spikes in workload. For a server with a
consistently high workload, keep this value high so that the adaptive flushing algorithm responds
immediately to changes in the I/O rate. For a server that experiences peaks and troughs in its workload,
keep this value low so that InnoDB
does not overreact to sudden spikes in DML activity.
Plan stability is a desirable goal for your biggest and most important queries. InnoDB has always computed statistics for each InnoDB table to help the optimizer find the most efficient query execution plan. Now you can make these statistics persistent, so that the index usage and join order for a particular query is less likely to change.
This feature is on by default, enabled by the configuration option innodb_stats_persistent
.
You control how much sampling is done to collect the statistics by setting the configuration options innodb_stats_persistent_sample_pages
and innodb_stats_transient_sample_pages
.
The configuration option innodb_stats_auto_recalc
determines whether the statistics are calculated
automatically whenever a table undergoes substantial changes (to more than 10% of the rows). If that setting
is disabled, ensure the accuracy of optimizer statistics by issuing the ANALYZE TABLE
statement for each applicable table after creating an index
or making substantial changes to indexed columns. You might run this statement in your setup scripts after
representative data has been loaded into the table, and run it periodically after DML operations
significantly change the contents of indexed columns, or on a schedule at times of low activity.
To ensure statistics are gathered when a new index is created, either enable the innodb_stats_auto_recalc
option, or run ANALYZE TABLE
after creating each new index when the persistent statistics mode is enabled.
You can also set innodb_stats_persistent
, innodb_stats_auto_recalc
, and innodb_stats_sample_pages
options at the session level before creating a table,
or use the STATS_PERSISTENT
, STATS_AUTO_RECALC
,
and STATS_SAMPLE_PAGES
clauses on the CREATE TABLE
and ALTER TABLE
statements, to override the system-wide setting and configure
persistent statistics for individual tables.
Formerly, these statistics were cleared on each server restart and after some other operations, and recomputed when the table was next accessed. The statistics are computed using a random sampling technique that could produce different estimates the next time, leading to different choices in the execution plan and thus variations in query performance.
To revert to the previous method of collecting statistics that are periodically erased, run the command
ALTER TABLE
. tbl_name
STATS_PERSISTENT=0
The persistent statistics feature relies on the internally managed tables in the mysql
database, named innodb_table_stats
and innodb_index_stats
. These tables are set up automatically in all install,
upgrade, and build-from-source procedures.
The innodb_table_stats
and innodb_index_stats
tables both include a last_update
column showing when index statistics were
last updated, as shown in the following example:
mysql> select * from INNODB_TABLE_STATS \G*************************** 1. row *************************** database_name: sakila table_name: actor last_update: 2013-05-28 16:16:44 n_rows: 200 clustered_index_size: 1sum_of_other_index_sizes: 1...
mysql> select * from INNODB_INDEX_STATS \G*************************** 1. row *************************** database_name: sakila table_name: actor index_name: PRIMARY last_update: 2013-05-28 16:16:44 stat_name: n_diff_pfx01 stat_value: 200 sample_size: 1 ...
If you manually update the statistics in the tables during troubleshooting or tuning, issue the command
FLUSH TABLE
to make MySQL
reload the updated statistics.tbl_name
In MySQL and InnoDB, multiple threads of execution access shared data structures. InnoDB synchronizes these accesses with its own implementation of mutexes and read/write locks. InnoDB has historically protected the internal state of a read/write lock with an InnoDB mutex. On Unix and Linux platforms, the internal state of an InnoDB mutex is protected by a Pthreads mutex, as in IEEE Std 1003.1c (POSIX.1c).
On many platforms, there is a more efficient way to implement mutexes and read/write locks. Atomic operations can often be used to synchronize the actions of multiple threads more efficiently than Pthreads. Each operation to acquire or release a lock can be done in fewer CPU instructions, and thus result in less wasted time when threads are contending for access to shared data structures. This in turn means greater scalability on multi-core platforms.
InnoDB implements mutexes and read/write locks with the pthread_mutex_t
to implement InnoDB mutexes and read/write locks.
On 32-bit Microsoft Windows, InnoDB has implemented mutexes (but not read/write locks) with hand-written
assembler instructions. Beginning with Microsoft Windows 2000, functions for
Solaris 10 introduced library functions for atomic operations, and InnoDB uses these functions by default.
When MySQL is compiled on Solaris 10 with a compiler that does not support the
This change improves the scalability of InnoDB on multi-core systems. This feature is enabled out-of-the-box on the platforms where it is supported. You do not have to set any parameter or option to take advantage of the improved performance. On platforms where the GCC, Windows, or Solaris functions for atomic memory access are not available, InnoDB uses the traditional Pthreads method of implementing mutexes and read/write locks.
When MySQL starts, InnoDB writes a message to the log file indicating whether atomic memory access is used
for mutexes, for mutexes and read/write locks, or neither. If suitable tools are used to build InnoDB and
the target CPU supports the atomic operations required, InnoDB uses the built-in functions for mutexing. If,
in addition, the compare-and-swap operation can be used on thread identifiers (pthread_t
),
then InnoDB uses the instructions for read-write locks as well.
Note: If you are building from source, ensure that the build process properly takes advantage of your platform capabilities.
For more information about the performance implications of locking, see Section 8.10, "Optimizing Locking Operations".
When InnoDB was developed, the memory allocators supplied with operating systems and run-time libraries were
often lacking in performance and scalability. At that time, there were no memory allocator libraries tuned
for multi-core CPUs. Therefore, InnoDB implemented its own memory allocator in the mem
subsystem. This allocator is guarded by a single mutex, which may become
a bottleneck. InnoDB also
implements a wrapper interface around the system allocator (malloc
and free
) that is likewise guarded by a single mutex.
Today, as multi-core systems have become more widely available, and as operating systems have matured,
significant improvements have been made in the memory allocators provided with operating systems. New memory
allocators perform better and are more scalable than they were in the past. The leading high-performance
memory allocators include Hoard
, libumem
, mtmalloc
, ptmalloc
, tbbmalloc
,
and TCMalloc
. Most workloads, especially those where memory is frequently
allocated and released (such as multi-table joins), benefit from using a more highly tuned memory allocator
as opposed to the internal, InnoDB-specific memory allocator.
You can control whether InnoDB uses its own memory allocator or an allocator of the operating system, by
setting the value of the system configuration parameter innodb_use_sys_malloc
in the MySQL option file (my.cnf
or my.ini
). If set to ON
or 1
(the default), InnoDB uses the malloc
and free
functions of the underlying system rather than manage memory pools itself. This parameter is not dynamic,
and takes effect only when the system is started. To continue to use the InnoDB memory allocator, set innodb_use_sys_malloc
to 0
.
When the InnoDB memory allocator is disabled, InnoDB ignores the value of the parameter innodb_additional_mem_pool_size
. The InnoDB memory allocator uses an
additional memory pool for satisfying allocation requests without having to fall back to the system
memory allocator. When the InnoDB memory allocator is disabled, all such allocation requests are
fulfilled by the system memory allocator.
On Unix-like systems that use dynamic linking, replacing the memory allocator may be as easy as
making the environment variable LD_PRELOAD
or LD_LIBRARY_PATH
point to the dynamic library that implements the allocator. On other systems, some relinking may be
necessary. Please refer to the documentation of the memory allocator library of your choice.
Since InnoDB cannot track all memory use when the system memory allocator is used (innodb_use_sys_malloc
is ON
), the section "BUFFER POOL AND MEMORY"
in the output of the SHOW ENGINE INNODB STATUS
command only includes the
buffer pool statistics in the "Total memory allocated". Any memory allocated using the mem
subsystem or using ut_malloc
is
excluded.
For more information about the performance implications of InnoDB memory usage, see Section 8.9, "Buffering and Caching".
When INSERT
, UPDATE
, and DELETE
operations are done to a table, often the values of indexed columns (particularly the values of secondary
keys) are not in sorted order, requiring substantial I/O to bring secondary indexes up to date. InnoDB has
an insert buffer that
caches changes to secondary index entries when the relevant page is not in the buffer pool, thus
avoiding I/O operations by not reading in the page from the disk. The buffered changes are merged when the
page is loaded to the buffer pool, and the updated page is later flushed to disk using the normal mechanism.
The InnoDB main thread merges buffered changes when the server is nearly idle, and during a slow
shutdown.
Because it can result in fewer disk reads and writes, this feature is most valuable for workloads that are I/O-bound, for example applications with a high volume of DML operations such as bulk inserts.
However, the insert buffer occupies a part of the buffer pool, reducing the memory available to cache data pages. If the working set almost fits in the buffer pool, or if your tables have relatively few secondary indexes, it may be useful to disable insert buffering. If the working set entirely fits in the buffer pool, insert buffering does not impose any extra overhead, because it only applies to pages that are not in the buffer pool.
You can control the extent to which InnoDB performs insert buffering with the system configuration parameter
innodb_change_buffering
.
You can turn on and off buffering for inserts, delete operations (when index records are initially marked
for deletion) and purge operations (when index records are physically deleted). An update operation is
represented as a combination of an insert and a delete. In MySQL 5.5 and higher, the default value is
changed from inserts
to all
.
The allowed values of innodb_change_buffering
are:
all
The default value: buffer inserts, delete-marking operations, and purges.
none
Do not buffer any operations.
inserts
Buffer insert operations.
deletes
Buffer delete-marking operations.
changes
Buffer both inserts and delete-marking.
purges
Buffer the physical deletion operations that happen in the background.
You can set the value of this parameter in the MySQL option file (my.cnf
or
my.ini
) or change it dynamically with the SET
GLOBAL
command, which requires the SUPER
privilege. Changing the
setting affects the buffering of new operations; the merging of already buffered entries is not affected.
For more information about speeding up INSERT
, UPDATE
, and DELETE
statements, see Section
8.2.2, "Optimizing DML Statements".
If a table fits almost entirely in main memory, the fastest way to perform queries on it is to use hash indexes rather than B-tree lookups. MySQL monitors searches
on each index defined for an InnoDB table. If it notices that certain index values are being accessed
frequently, it automatically builds an in-memory hash table for that index. See Section
14.2.3.12.6, "Adaptive Hash Indexes" for background information and usage guidelines for the adaptive hash
index feature and the innodb_adaptive_hash_index
configuration option.
InnoDB uses operating system threads to process requests from user transactions. (Transactions may issue many requests to InnoDB before they commit or roll back.) On modern operating systems and servers with multi-core processors, where context switching is efficient, most workloads run well without any limit on the number of concurrent threads. Scalability improvements in MySQL 5.5 and up reduce the need to limit the number of concurrently executing threads inside InnoDB.
In situations where it is helpful to minimize context switching between threads, InnoDB can use a number of techniques to limit the number of concurrently executing operating system threads (and thus the number of requests that are processed at any one time). When InnoDB receives a new request from a user session, if the number of threads concurrently executing is at a pre-defined limit, the new request sleeps for a short time before it tries again. A request that cannot be rescheduled after the sleep is put in a first-in/first-out queue and eventually is processed. Threads waiting for locks are not counted in the number of concurrently executing threads.
You can limit the number of concurrent threads by setting the configuration parameter innodb_thread_concurrency
. Once the number of executing threads reaches this
limit, additional threads sleep for a number of microseconds, set by the configuration parameter innodb_thread_sleep_delay
,
before being placed into the queue.
Previously, it required experimentation to find the optimal value for innodb_thread_sleep_delay
,
and the optimal value could change depending on the workload. In MySQL 5.6.3 and higher, you can set the
configuration option innodb_adaptive_max_sleep_delay
to the highest value you would allow for
innodb_thread_sleep_delay
, and InnoDB automatically adjusts innodb_thread_sleep_delay
up or down depending on the current
thread-scheduling activity. This dynamic adjustment helps the thread scheduling mechanism to work smoothly
during times when the system is lightly loaded and when it is operating near full capacity.
The default value for innodb_thread_concurrency
and the implied default limit on the number of
concurrent threads has been changed in various releases of MySQL and InnoDB. Currently, the default value of
innodb_thread_concurrency
is 0
, so that by default there is no limit on the number of concurrently
executing threads, as shown in Table 14.4,
"Changes to innodb_thread_concurrency
".
Table 14.4. Changes to innodb_thread_concurrency
InnoDB Version | MySQL Version | Default value | Default limit of concurrent threads | Value to allow unlimited threads |
---|---|---|---|---|
Built-in | Earlier than 5.1.11 | 20 | No limit | 20 or higher |
Built-in | 5.1.11 and newer | 8 | 8 | 0 |
InnoDB before 1.0.3 | (corresponding to Plugin) | 8 | 8 | 0 |
InnoDB 1.0.3 and newer | (corresponding to Plugin) | 0 | No limit | 0 |
Note that InnoDB causes threads to sleep only when the number of concurrent threads is limited. When there
is no limit on the number of threads, all contend equally to be scheduled. That is, if innodb_thread_concurrency
is 0
, the value of
innodb_thread_sleep_delay
is ignored.
When there is a limit on the number of threads, InnoDB reduces context switching overhead by permitting
multiple requests made during the execution of a single SQL statement to enter InnoDB without observing the
limit set by innodb_thread_concurrency
. Since an SQL statement (such as a join) may
comprise multiple row operations within InnoDB, InnoDB assigns "tickets"
that allow a thread to be scheduled repeatedly with minimal overhead.
When a new SQL statement starts, a thread has no tickets, and it must observe innodb_thread_concurrency
. Once the thread is entitled to enter InnoDB, it is
assigned a number of tickets that it can use for subsequently entering InnoDB. If the tickets run out, innodb_thread_concurrency
is observed again and further tickets are assigned. The number of tickets to assign is specified by the
global option innodb_concurrency_tickets
, which is 500 by default. A thread that is
waiting for a lock is given one ticket once the lock becomes available.
The correct values of these variables depend on your environment and workload. Try a range of different values to determine what value works for your applications. Before limiting the number of concurrently executing threads, review configuration options that may improve the performance of InnoDB on multi-core and multi-processor computers, such as innodb_use_sys_malloc and innodb_adaptive_hash_index.
For general performance information about MySQL thread handling, see Section 8.11.5.1, "How MySQL Uses Threads for Client Connections".
A read-ahead request is an I/O request to prefetch multiple pages in the buffer pool asynchronously, in anticipation that these pages will be needed soon. The requests bring in all the pages in one extent. InnoDB uses two read-ahead algorithms to improve I/O performance:
Linear read-ahead is a technique that predicts what pages might
be needed soon based on pages in the buffer pool being accessed sequentially. You control when InnoDB
performs a read-ahead operation by adjusting the number of sequential page accesses required to trigger an
asynchronous read request, using the configuration parameter innodb_read_ahead_threshold
. Before this parameter was added, InnoDB
would only calculate whether to issue an asynchronous prefetch request for the entire next extent when it
read in the last page of the current extent.
The new configuration parameter innodb_read_ahead_threshold
controls how sensitive InnoDB is in detecting
patterns of sequential page access. If the number of pages read sequentially from an extent is greater than
or equal to innodb_read_ahead_threshold
, InnoDB initiates an asynchronous read-ahead
operation of the entire following extent. It can be set to any value from 0-64. The default value is 56. The
higher the value, the more strict the access pattern check. For example, if you set the value to 48, InnoDB
triggers a linear read-ahead request only when 48 pages in the current extent have been accessed
sequentially. If the value is 8, InnoDB would trigger an asynchronous read-ahead even if as few as 8 pages
in the extent were accessed sequentially. You can set the value of this parameter in the MySQL configuration
file, or change it dynamically with the SET GLOBAL
command, which
requires the SUPER
privilege.
Random read-ahead is a technique that predicts when pages might
be needed soon based on pages already in the buffer pool, regardless of the order in which those pages were
read. If 13 consecutive pages from the same extent are found in the buffer pool, InnoDB asynchronously
issues a request to prefetch the remaining pages of the extent. This feature was initially turned off in
MySQL 5.5. It is available once again starting in MySQL 5.1.59 and 5.5.16 and higher, turned off by default.
To enable this feature, set the configuration variable innodb_random_read_ahead
.
The SHOW ENGINE INNODB STATUS
command displays statistics to help you evaluate
the effectiveness of the read-ahead algorithm. With the return of random read-ahead in MySQL 5.6, the SHOW ENGINE INNODB STATUS
command once again includes Innodb_buffer_pool_read_ahead_rnd
.
Innodb_buffer_pool_read_ahead
keeps its current name. (In earlier releases, it
was listed as Innodb_buffer_pool_read_ahead_seq
.) See Section
14.2.5.11, "More Read-Ahead Statistics" for more information.
For more information about I/O performance, see Section 8.5.7,
"Optimizing InnoDB
Disk I/O" and Section
8.11.3, "Optimizing Disk I/O".
InnoDB uses background threads to
service various types of I/O requests. You can configure the number of background threads that service read
and write I/O on data pages, using the configuration parameters innodb_read_io_threads
and innodb_write_io_threads
. These parameters signify the number of
background threads used for read and write requests respectively. They are effective on all supported
platforms. You can set the value of these parameters in the MySQL option file (my.cnf
or my.ini
); you cannot change them
dynamically. The default value for these parameters is 4
and the permissible
values range from 1-64
.
The purpose of this change is to make InnoDB more scalable on high end systems. Each background thread can
handle up to 256 pending I/O requests. A major source of background I/O is the read-ahead
requests. InnoDB tries to balance the load of incoming requests in such way that most of the background
threads share work equally. InnoDB also attempts to allocate read requests from the same extent to the same
thread to increase the chances of coalescing the requests together. If you have a high end I/O subsystem and
you see more than 64 × innodb_read_io_threads
pending read requests in SHOW
ENGINE INNODB STATUS
, you might gain by increasing the value of innodb_read_io_threads
.
For more information about InnoDB I/O performance, see Section
8.5.7, "Optimizing InnoDB
Disk I/O".
Starting in InnoDB 1.1 with MySQL 5.5, the asynchronous I/O capability that
InnoDB has had on Windows systems is now available on Linux systems. (Other Unix-like systems continue to
use synchronous I/O calls.) This feature improves the scalability of heavily I/O-bound systems, which
typically show many pending reads/writes in the output of the command SHOW ENGINE
INNODB STATUS\G
.
Running with a large number of InnoDB
I/O threads, and especially running
multiple such instances on the same server machine, can exceed capacity limits on Linux systems. In this
case, you can fix the error:
EAGAIN: The specified maxevents exceeds the user's limit of available events.
by writing a higher limit to /proc/sys/fs/aio-max-nr
.
In general, if a problem with the asynchronous I/O subsystem in the OS prevents InnoDB from starting, set
the option innodb_use_native_aio=0
in the configuration file. This new configuration option applies to Linux systems only, and cannot be
changed once the server is running.
For more information about InnoDB I/O performance, see Section
8.5.7, "Optimizing InnoDB
Disk I/O".
InnoDB, like any other ACID-compliant database engine, flushes the redo log of a transaction before it is committed. Historically, InnoDB used group commit functionality to group multiple such flush requests together to avoid one flush for each commit. With group commit, InnoDB issues a single write to the log file to perform the commit action for multiple user transactions that commit at about the same time, significantly improving throughput.
Group commit in InnoDB worked until MySQL 4.x, and works once again with MySQL 5.1 with the InnoDB Plugin, and MySQL 5.5 and higher. The introduction of support for the distributed transactions and Two Phase Commit (2PC) in MySQL 5.0 interfered with the InnoDB group commit functionality. This issue is now resolved.
The group commit functionality inside InnoDB works with the Two Phase Commit protocol in MySQL. Re-enabling
of the group commit functionality fully ensures that the ordering of commit in the MySQL binlog and the
InnoDB logfile is the same as it was before. It means it is totally safe to use
the MySQL Enterprise Backup product with InnoDB 1.0.4 (that is, the InnoDB Plugin with
MySQL 5.1) and above. When the binlog is enabled, you typically also set the configuration option sync_binlog=0
, because group commit for the binary log is only supported if
it is set to 0.
Group commit is transparent; you do not need to do anything to take advantage of this significant performance improvement.
For more information about performance of COMMIT
and other transactional
operations, see Section 8.5.2, "Optimizing InnoDB
Transaction Management".
The master thread in InnoDB is a thread that performs various tasks in the background. Most of these tasks are I/O related, such as flushing dirty pages from the buffer pool or writing changes from the insert buffer to the appropriate secondary indexes. The master thread attempts to perform these tasks in a way that does not adversely affect the normal working of the server. It tries to estimate the free I/O bandwidth available and tune its activities to take advantage of this free capacity. Historically, InnoDB has used a hard coded value of 100 IOPs (input/output operations per second) as the total I/O capacity of the server.
The parameter innodb_io_capacity
indicates the overall I/O capacity available to InnoDB, per buffer pool instance. This parameter should be
set to approximately the number of I/O operations that the system can perform per second. The value depends
on your system configuration. When innodb_io_capacity
is set, the master threads estimates the I/O bandwidth
available for background tasks based on the set value. Setting the value to 100
reverts to the old behavior.
You can set the value of innodb_io_capacity
to any number 100 or greater. The default value is 200
, reflecting that the performance of typical modern I/O devices is higher than
in the early days of MySQL. Typically, values around the previous default of 100 are appropriate for
consumer-level storage devices, such as hard drives up to 7200 RPMs. Faster hard drives, RAID
configurations, and SSDs benefit from higher values.
You can set the value of this parameter in the MySQL option file (my.cnf
or
my.ini
) or change it dynamically with the SET
GLOBAL
command, which requires the SUPER
privilege.
Formerly, the InnoDB
master thread also performed any needed purge
operations. In MySQL 5.6.5 and higher, those I/O operations are moved to other background threads, whose
number is controlled by the innodb_purge_threads
configuration option.
For more information about InnoDB I/O performance, see Section
8.5.7, "Optimizing InnoDB
Disk I/O".
InnoDB performs certain tasks in the background, including flushing of dirty
pages (those pages that have been changed but are not yet written to the database files) from the buffer pool, a task performed
by the master thread.
Currently, InnoDB aggressively flushes buffer pool pages if the percentage of dirty pages in the buffer pool
exceeds innodb_max_dirty_pages_pct
.
InnoDB uses a new algorithm to estimate the required rate of flushing, based on the speed of redo log generation and the current rate of flushing. The intent is to smooth overall performance by ensuring that buffer flush activity keeps up with the need to keep the buffer pool "clean". Automatically adjusting the rate of flushing can help to avoid sudden dips in throughput, when excessive buffer pool flushing limits the I/O capacity available for ordinary read and write activity.
InnoDB uses its log files in a circular fashion. Before reusing a portion of a log file, InnoDB flushes to
disk all dirty buffer pool pages whose redo entries are contained in that portion of the log file, a process
known as a sharp
checkpoint. If a workload is write-intensive, it generates a lot of redo information, all written to
the log file. If all available space in the log files is used up, a sharp checkpoint occurs, causing a
temporary reduction in throughput. This situation can happen even though innodb_max_dirty_pages_pct
is not reached.
InnoDB uses a heuristic-based algorithm to avoid such a scenario, by measuring the number of dirty pages in the buffer pool and the rate at which redo is being generated. Based on these numbers, InnoDB decides how many dirty pages to flush from the buffer pool each second. This self-adapting algorithm is able to deal with sudden changes in the workload.
Internal benchmarking has also shown that this algorithm not only maintains throughput over time, but can also improve overall throughput significantly.
Because adaptive flushing is a new feature that can significantly affect the I/O pattern of a workload, a
new configuration parameter lets you turn off this feature. The default value of the boolean parameter innodb_adaptive_flushing
is TRUE
, enabling the new algorithm. You can set the value of this parameter in
the MySQL option file (my.cnf
or my.ini
) or change
it dynamically with the SET GLOBAL
command, which requires the SUPER
privilege.
For more information about InnoDB I/O performance, see Section
8.5.7, "Optimizing InnoDB
Disk I/O".
Synchronization inside InnoDB frequently involves the use of spin loops: while waiting,
InnoDB executes a tight loop of instructions repeatedly to avoid having the InnoDB process
and threads be rescheduled by the
operating system. If the spin loops are executed too quickly, system resources are wasted, imposing a
performance penalty on transaction throughput. Most modern processors implement the PAUSE
instruction for use in spin loops, so the processor can be more efficient.
InnoDB uses a PAUSE
instruction in its spin loops on all platforms where such
an instruction is available. This technique increases overall performance with CPU-bound workloads, and has
the added benefit of minimizing power consumption during the execution of the spin loops.
You do not have to do anything to take advantage of this performance improvement.
For performance considerations for InnoDB locking operations, see Section 8.10, "Optimizing Locking Operations".
Many InnoDB mutexes and rw-locks are reserved for a short time. On a multi-core system, it can be more efficient for a thread to continuously check if it can acquire a mutex or rw-lock for a while before sleeping. If the mutex or rw-lock becomes available during this polling period, the thread can continue immediately, in the same time slice. However, too-frequent polling by multiple threads of a shared object can cause "cache ping pong", different processors invalidating portions of each others' cache. InnoDB minimizes this issue by waiting a random time between subsequent polls. The delay is implemented as a busy loop.
You can control the maximum delay between testing a mutex or rw-lock using the parameter innodb_spin_wait_delay
. The duration of the delay loop depends on the C
compiler and the target processor. (In the 100MHz Pentium era, the unit of delay was one microsecond.) On a
system where all processor cores share a fast cache memory, you might reduce the maximum delay or disable
the busy loop altogether by setting innodb_spin_wait_delay=0
. On a system with
multiple processor chips, the effect of cache invalidation can be more significant and you might increase
the maximum delay.
The default value of innodb_spin_wait_delay
is 6
. The spin wait delay
is a dynamic global parameter that you can specify in the MySQL option file (my.cnf
or my.ini
) or change at runtime with the
command SET GLOBAL innodb_spin_wait_delay=
, where delay
is the desired maximum delay. Changing the
setting requires the delay
SUPER
privilege.
For performance considerations for InnoDB locking operations, see Section 8.10, "Optimizing Locking Operations".
Rather than using a strictly LRU algorithm, InnoDB uses a technique to minimize the amount of data that is brought into the buffer pool and never accessed again. The goal is to make sure that frequently accessed ("hot") pages remain in the buffer pool, even as read-ahead and full table scans bring in new blocks that might or might not be accessed afterward.
Newly read blocks are inserted into the middle of the list representing the buffer pool. of the LRU list.
All newly read pages are inserted at a location that by default is 3/8
from the
tail of the LRU list. The pages are moved to the front of the list (the most-recently used end) when they
are accessed in the buffer pool for the first time. Thus pages that are never accessed never make it to the
front portion of the LRU list, and "age out" sooner
than with a strict LRU approach. This arrangement divides the LRU list into two segments, where the pages
downstream of the insertion point are considered "old"
and are desirable victims for LRU eviction.
For an explanation of the inner workings of the InnoDB buffer pool and the specifics of its LRU replacement
algorithm, see Section 8.9.1, "The InnoDB
Buffer Pool".
You can control the insertion point in the LRU list, and choose whether InnoDB applies the same optimization
to blocks brought into the buffer pool by table or index scans. The configuration parameter innodb_old_blocks_pct
controls the percentage of "old" blocks in the LRU list. The default value of
innodb_old_blocks_pct
is 37
, corresponding to the original fixed ratio of 3/8. The value range is
5
(new pages in the buffer pool age out very quickly) to 95
(only 5% of the buffer pool is reserved for hot pages, making the algorithm close to the familiar LRU
strategy).
The optimization that keeps the buffer pool from being churned by read-ahead can avoid similar problems due
to table or index scans. In these scans, a data page is typically accessed a few times in quick succession
and is never touched again. The configuration parameter innodb_old_blocks_time
specifies the time window (in milliseconds) after
the first access to a page during which it can be accessed without being moved to the front (most-recently
used end) of the LRU list. The default value of innodb_old_blocks_time
is 0
, corresponding
to the original behavior of moving a page to the most-recently used end of the buffer pool list when it is
first accessed in the buffer pool. Increasing this value makes more and more blocks likely to age out faster
from the buffer pool.
Both innodb_old_blocks_pct
and innodb_old_blocks_time
are dynamic, global and can be specified in the MySQL option file (my.cnf
or
my.ini
) or changed at runtime with the SET GLOBAL
command. Changing the setting requires the SUPER
privilege.
To help you gauge the effect of setting these parameters, the SHOW ENGINE INNODB
STATUS
command reports additional statistics. The BUFFER POOL AND MEMORY
section looks like:
Total memory allocated 1107296256; in additional pool allocated 0Dictionary memory allocated 80360Buffer pool size 65535Free buffers 0Database pages 63920Old database pages 23600Modified db pages 34969Pending reads 32Pending writes: LRU 0, flush list 0, single page 0Pages made young 414946, not young 29306731274.75 youngs/s, 16521.90 non-youngs/sPages read 486005, created 3178, written 1605852132.37 reads/s, 3.40 creates/s, 323.74 writes/sBuffer pool hit rate 950 / 1000, young-making rate 30 / 1000 not 392 / 1000Pages read ahead 1510.10/s, evicted without access 0.00/sLRU len: 63920, unzip_LRU len: 0I/O sum[43690]:cur[221], unzip sum[0]:cur[0]
Old database pages
is the number of pages in the
"old" segment of the LRU list.
Pages made young
and not
young
is the total number of "old"
pages that have been made young or not respectively.
youngs/s
and non-young/s
is the rate at which page accesses to the "old" pages have resulted in making such pages
young or otherwise respectively since the last invocation of the command.
young-making rate
and not
provides the same rate but in terms of overall buffer pool accesses
instead of accesses just to the "old" pages.
Per second averages provided in InnoDB
Monitor output are based on
the elapsed time between the current time and the last time InnoDB
Monitor
output was printed.
Because the effects of these parameters can vary widely based on your hardware configuration, your data, and the details of your workload, always benchmark to verify the effectiveness before changing these settings in any performance-critical or production environment.
In mixed workloads where most of the activity is OLTP type with periodic batch reporting queries which
result in large scans, setting the value of innodb_old_blocks_time
during the batch runs can help keep the working
set of the normal workload in the buffer pool.
When scanning large tables that cannot fit entirely in the buffer pool, setting innodb_old_blocks_pct
to a small value keeps the data that is only read once
from consuming a significant portion of the buffer pool. For example, setting innodb_old_blocks_pct=5
restricts this data that is only read once to 5% of the buffer pool.
When scanning small tables that do fit into memory, there is less overhead for moving pages around within
the buffer pool, so you can leave innodb_old_blocks_pct
at its default value, or even higher, such as innodb_old_blocks_pct=50
.
The effect of the innodb_old_blocks_time
parameter is harder to predict than the innodb_old_blocks_pct
parameter, is relatively small, and varies more with the workload. To arrive at an optimal value, conduct
your own benchmarks if the performance improvement from adjusting innodb_old_blocks_pct
is not sufficient.
For more information about the InnoDB buffer pool, see Section
8.9.1, "The InnoDB
Buffer Pool".
A number of optimizations speed up certain steps of the recovery that happens on the next startup after a crash. In particular, scanning the redo log and applying the redo log are faster than in MySQL 5.1 and earlier, due to improved algorithms for memory management. You do not need to take any actions to take advantage of this performance enhancement. If you kept the size of your redo log files artificially low because recovery took a long time, you can consider increasing the file size.
For more information about InnoDB recovery, see Section
14.2.3.13, "The InnoDB
Recovery Process".
Starting with InnoDB 1.1 with MySQL 5.5, you can profile certain internal InnoDB operations using the MySQL Performance Schema feature. This type of tuning is primarily for expert users, those who push the limits of MySQL performance, read the MySQL source code, and evaluate optimization strategies to overcome performance bottlenecks. DBAs can also use this feature for capacity planning, to see whether their typical workload encounters any performance bottlenecks with a particular combination of CPU, RAM, and disk storage; and if so, to judge whether performance can be improved by increasing the capacity of some part of the system.
To use this feature to examine InnoDB performance:
You must be running MySQL 5.5 or higher. You must build the database server
from source, enabling the Performance Schema feature by building with the --with-perfschema
option. Since the Performance Schema feature introduces some performance overhead, you should use it
on a test or development system rather than on a production system.
You must be running InnoDB 1.1 or higher.
You must be generally familiar with how to use the Performance
Schema feature, for example to query tables in the performance_schema
database.
Examine the following kinds of InnoDB objects by querying the appropriate performance_schema
tables. The items associated with InnoDB all contain
the substring innodb
in the EVENT_NAME
column.
For the definitions of the *_instances
tables, see Section
21.9.2, "Performance Schema Instance Tables". For the definitions of the *_summary_*
tables, see Section
21.9.8, "Performance Schema Summary Tables". For the definition of the thread
table, see Section
21.9.9, "Performance Schema Miscellaneous Tables". For the definition of the *_current_*
and *_history_*
tables, see
Section 21.9.3, "Performance Schema Wait
Event Tables".
Mutexes
in the mutex_instances
table. (Mutexes and RW-locks related
to the InnoDB
buffer pool are not included in this
coverage; the same applies to the output of the SHOW ENGINE INNODB
MUTEX
command.)
RW-locks
in the rwlock_instances
table.
RW-locks in the rwlock_instances
table.
File I/O operations in the file_instances
, file_summary_by_event_name
,
and file_summary_by_instance
tables.
Threads
in the PROCESSLIST
table.
During performance testing, examine the performance data in the events_waits_current
and events_waits_history_long
tables. If you are interested especially in
InnoDB-related objects, use the clause WHERE EVENT_NAME LIKE '%innodb%'
to see just those entries; otherwise, examine the performance statistics for the overall MySQL
server.
You must be running MySQL 5.5, with the Performance Schema enabled by building
with the --with-perfschema
build option.
For more information about the MySQL Performance Schema, see Chapter 21, MySQL Performance Schema.
This performance enhancement is primarily useful for people with a large buffer
pool size, typically in the multi-gigabyte range. To take advantage of this speedup, you must set the
new innodb_buffer_pool_instances
configuration option, and you might also adjust the innodb_buffer_pool_size
value.
When the InnoDB buffer pool is large, many data requests can be satisfied by retrieving from memory. You might encounter bottlenecks from multiple threads trying to access the buffer pool at once. Starting in InnoDB 1.1 and MySQL 5.5, you can enable multiple buffer pools to minimize this contention. Each page that is stored in or read from the buffer pool is assigned to one of the buffer pools randomly, using a hashing function. Each buffer pool manages its own free lists, flush lists, LRUs, and all other data structures connected to a buffer pool, and is protected by its own buffer pool mutex.
To enable this feature, set the innodb_buffer_pool_instances
configuration
option to a value greater than 1 (the default) up to 64 (the maximum). This option takes effect only when
you set the innodb_buffer_pool_size
to a size of 1 gigabyte or more. The total
size you specify is divided among all the buffer pools. For best efficiency, specify a combination of innodb_buffer_pool_instances
and innodb_buffer_pool_size
so that each buffer pool instance is at least 1 gigabyte.
For more information about the InnoDB buffer pool, see Section
8.9.1, "The InnoDB
Buffer Pool".
Starting in InnoDB 1.1 with MySQL 5.5, the limit on concurrent transactions is greatly expanded, removing a bottleneck with the InnoDB rollback segment that affected high-capacity systems. The limit applies to concurrent transactions that change any data; read-only transactions do not count against that maximum.
The single rollback segment is now divided into 128 segments, each of which can support up to 1023 transactions that perform writes, for a total of approximately 128K concurrent transactions. The original transaction limit was 1023.
Each transaction is assigned to one of the rollback segments, and remains tied to that rollback segment for the duration. This enhancement improves both scalability (higher number of concurrent transactions) and performance (less contention when different transactions access the rollback segments).
To take advantage of this feature, you do not need to create any new database or tables, or reconfigure anything. You must do a slow shutdown before upgrading from MySQL 5.1 or earlier, or some time afterward. InnoDB makes the required changes inside the system tablespace automatically, the first time you restart after performing a slow shutdown.
If your workload was not constrained by the original limit of 1023 concurrent transactions, you can reduce
the number of rollback segments used within a MySQL instance or within a session by setting the
configuration option innodb_rollback_segments
.
For more information about performance of InnoDB under high transactional load, see Section
8.5.2, "Optimizing InnoDB
Transaction Management".
The purge operations (a type of garbage collection) that InnoDB performs automatically is now done in one or more separate threads, rather than as part of the master thread. This change improves scalability, because the main database operations run independently from maintenance work happening in the background.
To control this feature, increase the value of the configuration option innodb_purge_threads=
. If DML action is concentrated on a single table or a few
tables, keep the setting low so that the threads do not contend with each other for access to the busy
tables. If DML operations are spread across many tables, increase the setting. Its maximum is 32. n
There is another related configuration option, innodb_purge_batch_size
with a
default of 20 and maximum of 5000. This option is mainly intended for experimentation and tuning of purge
operations, and should not be interesting to typical users.
For more information about InnoDB I/O performance, see Section
8.5.7, "Optimizing InnoDB
Disk I/O".
This is another performance improvement that comes for free, with no user action or configuration needed. The details here are intended for performance experts who delve into the InnoDB source code, or interpret reports with keywords such as "mutex" and "log_sys".
The mutex known as the log sys mutex has
historically done double duty, controlling access to internal data structures related to log records and the
LSN, as well as pages in the buffer
pool that are changed when a mini-transaction is committed. Starting
in InnoDB 1.1 with MySQL 5.5, these two kinds of operations are protected by separate mutexes, with a new
log_buf
mutex controlling writes to buffer pool pages due to mini-transactions.
For performance considerations for InnoDB locking operations, see Section 8.10, "Optimizing Locking Operations".
Starting with InnoDB 1.1 with MySQL 5.5, concurrent access to the buffer pool is faster. Operations involving the flush list, a data structure related to the buffer pool, are now controlled by a separate mutex and do not block access to the buffer pool. You do not need to configure anything to take advantage of this speedup; it is fully automatic.
For more information about the InnoDB buffer pool, see Section
8.9.1, "The InnoDB
Buffer Pool".
The mutex controlling concurrent access to
the InnoDB
kernel is now divided into separate mutexes and rw-locks
to reduce contention. You do not need to configure anything to take advantage of this speedup; it is fully
automatic.
To ease the memory load on systems with huge numbers of tables, InnoDB now frees up the memory associated
with an opened table, using an LRU algorithm to select tables that have gone the longest without being
accessed. To reserve more memory to hold metadata for open InnoDB tables, increase the value of the table_definition_cache
configuration option. InnoDB treats this value as a "soft limit". The actual number of tables with cached metadata could
be higher, because metadata for InnoDB system tables, and parent and child tables in foreign key
relationships, is never evicted from memory.
Several new features extend the file-per-table mode enabled by the innodb_file_per_table
configuration option, allowing more flexibility in how
the .ibd
files are placed, exported, and restored. We characterize this as a
performance enhancement because it solves the common customer request to put data from different tables onto
different storage devices, for best price/performance depending on the access patterns of the data. For
example, tables with high levels of random reads and writes might be placed on an SSD
device, while less-often-accessed data or data processed with large batches of sequential I/O might be
placed on an HDD device. See Section
5.4.1, "Managing InnoDB Tablespaces" for details.
The memcached daemon is frequently used as an in-memory
caching layer in front of a MySQL database server. Now MySQL allows direct access to InnoDB
tables using the familiar memcached protocol and client
libraries. Instead of formulating queries in SQL, you can perform simple get, set, and increment operations
that avoid the performance overhead of SQL parsing and constructing a query optimization plan. You can also
access the underlying InnoDB
tables through SQL to load data, generate reports,
or perform multi-step transactional computations.
This technique allows the data to be stored in MySQL for reliability and consistency, while coding application logic that uses the database as a fast key-value store.
This feature combines the best of both worlds:
Data that is written using the memcached protocol is transparently written to an InnoDB table, without going through the MySQL SQL layer. You can control the frequency of writes to achieve higher raw performance when updating non-critical data.
Data that is requested data through the memcached protocol is transparently queried from an InnoDB table, without going through the MySQL SQL layer.
Subsequent requests for the same data will be served from the InnoDB
buffer pool. The buffer pool handles the in-memory caching. You
can tune the performance of data-intensive operations using the familiar InnoDB
configuration options.
InnoDB can handle composing and decomposing multiple column values into a
single memcached item value, reducing the amount of
string parsing and concatenation required in your application. For example, you might store a string
value 2|4|6|8
in the memcached cache, and InnoDB splits that value based
on a separator character, then stores the result into four numeric columns.
For details on using this NoSQL-style interface to MySQL, see Section 14.2.9, "InnoDB Integration with memcached". For additional background on memcached and considerations for writing applications for its API, see Section 15.6, "Using MySQL with memcached".
This feature is a continuation of the "Fast Index Creation" feature introduced in
For full details, see Section 5.5, "Online DDL for InnoDB
Tables".
The DDL operations enhanced by this feature are these variations on the ALTER TABLE
statement:
Create secondary indexes: CREATE
INDEX
or name
ON table
(col_list
)ALTER TABLE
. (Creating a primary key or a table
ADD INDEX name
(col_list
)FULLTEXT
index still requires locking the table.)
Drop secondary
indexes: DROP INDEX
or name
ON
table
;ALTER TABLE
table
DROP INDEX name
Creating and dropping secondary indexes on InnoDB
tables has
avoided the table-copying behavior since the days of MySQL 5.1 with the InnoDB
Plugin. Now, the table remains available for read and write operations while the index is being
created or dropped. The CREATE
TABLE
or DROP
TABLE
statement only finishes after all transactions that are modifying the table
are completed, so that the initial state of the index reflects the most recent contents of the
table.
Previously, modifying the table while an index was being created or dropped typically resulted in a deadlock that cancelled the insert, update, or delete statement on the table.
Changing the auto-increment value for a column:
ALTER TABLE
table
AUTO_INCREMENT=next_value
;
Especially in a distributed system using replication or sharding, you sometimes reset the auto-increment counter for a table to a specific value. The next row inserted into the table uses the specified value for its auto-increment column. You might also use this technique in a data warehousing environment where you periodically empty all the tables and reload them, and you can restart the auto-increment sequence from 1.
Adding or dropping a foreign key constraint:
ALTER TABLEtbl1
ADD CONSTRAINTfk_name
FOREIGN KEYindex
(col1
) REFERENCEStbl2
(col2
)referential_actions
;ALTER TABLEtbl
DROP FOREIGN KEYfk_name
;
Dropping a foreign key can be performed online with the foreign_key_checks
option enabled or disabled. Creating a foreign
key online requires foreign_key_checks
to be disabled.
If you do not know the names of the foreign key constraints on a particular table, issue the
following statement and find the constraint name in the CONSTRAINT
clause for each foreign key:
show create table table
\G
Or, query the information_schema.table_constraints
table and use the constraint_name
and constraint_type
columns to identify the foreign key names.
As a consequence of this enhancement, you can now also drop a foreign key and its associated index in a single statement, which previously required separate statements in a strict order:
ALTER TABLEtable
DROP FOREIGN KEYconstraint
, DROP INDEXindex
;
Renaming a column: ALTER TABLE
tbl
CHANGE old_col_name
new_col_name
datatype
When you keep the same data type and only change the column name, this operation can always be performed online. As part of this enhancement, you can now rename a column that is part of a foreign key constraint, which was not allowed before.
Some other ALTER
TABLE
operations are non-blocking, and are faster than before because the table-copying
operation is optimized, even though a table copy is still required:
Changing the ROW_FORMAT
or KEY_BLOCK_SIZE
properties for a table.
Changing the nullable status for a column.
Adding, dropping, or reordering columns.
As your database schema evolves with new columns, data types, constraints, indexes, and so on,
keep your CREATE TABLE
statements up to date with the latest table definitions.
Even with the performance improvements of online DDL, it is more efficient to create stable database
structures at the beginning, rather than creating part of the schema and then issuing ALTER TABLE
statements afterward.
The main exception to this guideline is for secondary indexes on tables with large numbers of rows. It is typically most efficient to create the table with all details specified except the secondary indexes, load the data, then create the secondary indexes.
Whatever sequence of CREATE
TABLE
, CREATE INDEX
,
ALTER TABLE
, and similar statements went into putting a table
together, you can capture the SQL needed to reconstruct the current form of the table by issuing the
statement SHOW CREATE TABLE
(uppercase table
\G\G
required for tidy formatting). This output shows clauses such as numeric precision, NOT NULL
, and CHARACTER SET
that are
sometimes added behind the scenes, and you might otherwise leave out when cloning the table on a new
system or setting up foreign key columns with identical type.