14.2.4.2. InnoDB Performance and Scalability Enhancements

14.2.4.2.1. Overview of InnoDB Performance

InnoDB has always been highly efficient, and includes several unique architectural elements to assure high performance and scalability. The latest InnoDB storage engine includes new features that take advantage of advances in operating systems and hardware platforms, such as multi-core processors and improved memory allocation systems. In addition, new configuration options let you better control some InnoDB internal subsystems to achieve the best performance with your workload.

Starting with MySQL 5.5 and InnoDB 1.1, the built-in InnoDB storage engine within MySQL is upgraded to the full feature set and performance of the former InnoDB Plugin. This change makes these performance and scalability enhancements available to a much wider audience than before, and eliminates the separate installation step of the InnoDB Plugin. After learning about the InnoDB performance features in this section, continue with Chapter 8, Optimization to learn the best practices for overall MySQL performance, and Section 8.5, "Optimizing for InnoDB Tables" in particular for InnoDB tips and guidelines.

14.2.4.2.2. Compression Enhancements for OLTP Workloads

Traditionally, the InnoDB compression feature was recommended primarily for read-only or read-mostly workloads, such as in a data warehouse configuration. The rise of SSD storage devices, which are fast but relatively small and expensive, makes compression attractive also for OLTP workloads: high-traffic, interactive web sites can reduce their storage requirements and their I/O operations per second (IOPS) by using compressed tables with applications that do frequent INSERT, UPDATE, and DELETE operations.

New configuration options in MySQL 5.6 let you adjust the way compression works for a particular MySQL instance, with an emphasis on performance and scalability for write-intensive operations:

innodb_compression_level lets you turn the degree of compression up or down. A higher value lets you fit more data onto a storage device, at the expense of more CPU overhead during compression. A lower value lets you reduce CPU overhead when storage space is not critical, or you expect the data is not especially compressible.
innodb_compression_failure_threshold_pct specifies a cutoff point for compression failures during updates to a compressed table. When this threshold is passed, MySQL begins to leave additional free space within each new compressed page, dynamically adjusting the amount of free space up to the percentage of page size specified by innodb_compression_pad_pct_max
innodb_compression_pad_pct_max lets you adjust the maximum amount of space reserved within each page to record changes to compressed rows, without needing to compress the entire page again. The higher the value, the more changes can be recorded without recompressing the page. MySQL uses a variable amount of free space for the pages within each compressed table, only when a designated percentage of compression operations "fail" at runtime, requiring an expensive operation to split the compressed page.

Because working with compressed data sometimes involves keeping both compressed and uncompressed versions of a page in memory at the same time, when using compression with an OLTP-style workload, be prepared to increase the value of the innodb_buffer_pool_size configuration option.

For more information on MySQL data compression, see Section 5.4.6, "Working with InnoDB Compressed Tables". For the performance aspects, especially see the section Section 5.4.6.3, "Tuning Compression for InnoDB Tables".

14.2.4.2.3. Optimizations for Read-Only Transactions

When a transaction is known to be read-only, InnoDB can avoid the overhead associated with setting up the transaction ID (TRX_ID field). The transaction ID is only needed for a transaction that might perform write operations or locking reads such as SELECT ... FOR UPDATE. Eliminating these unnecessary transaction IDs reduces the size of internal data structures that are consulted each time a query or DML statement constructs a read view.

Currently, InnoDB detects the read-only nature of the transaction and applies this optimization when any of the following conditions are met:

The transaction is started with the START TRANSACTION READ ONLY statement. In this case, attempting to make any changes to the database (for InnoDB, MyISAM, or other types of tables) causes an error, and the transaction continues in read-only state:
```
ERROR 1792 (25006): Cannot execute statement in a READ ONLY transaction.
```
You can still make changes to session-specific temporary tables in a read-only transaction, or issue locking queries for them, because those changes and locks are not visible to any other transaction.
The autocommit setting is turned on, so that the transaction is guaranteed to be a single statement, and the single statement making up the transaction is a "non-locking" SELECT statement. That is, a SELECT that does not use a FOR UPDATE or LOCK IN SHARED MODE clause.

Thus, for a read-intensive application such as a report generator, you can tune a sequence of InnoDB queries by grouping them inside START TRANSACTION READ ONLY and COMMIT, or by turning on the autocommit setting before running the SELECT statements, or simply by avoiding any DML statements interspersed with the queries.

Note

Transactions that qualify as auto-commit, non-locking, and read-only (AC-NL-RO) are kept out of certain internal InnoDB data structures and are therefore not listed in SHOW ENGINE INNODB STATUS output. These transactions are only visible in the Information Schema.

14.2.4.2.4. Separate Tablespaces for InnoDB Undo Logs

This feature optionally moves the InnoDB undo log out of the system tablespace into one or more separate tablespaces. The I/O patterns for the undo log make these new tablespaces good candidates to move to SSD storage, while keeping the system tablespace on hard disk storage. Users cannot drop the separate tablespaces created to hold InnoDB undo logs, or the individual segments inside those tablespaces.

Because these files handle I/O operations formerly done inside the system tablespace, we broaden the definition of system tablespace to include these new files.

The undo logs are also known as the rollback segments.

This feature involves the following new or renamed configuration options:

innodb_undo_tablespaces.
innodb_undo_directory.
innodb_rollback_segments becomes innodb_undo_logs. The old name is still available for compatibility.

Usage Notes

To use this feature, follow these steps:

Decide on a path on a fast storage device to hold the undo logs. You will specify that path as the argument to the innodb_undo_directory option in your MySQL configuration file or startup script.
Decide on a non-zero starting value for the innodb_undo_logs option. You can start with a relatively low value and increase it over time to examine the effect on performance.
Decide on a non-zero value for the innodb_undo_tablespaces option. The multiple undo logs specified by the innodb_undo_logs value are divided between this many separate tablespaces (represented by .ibd files). This value is fixed for the life of the MySQL instance, so if you are uncertain about the optimal value, estimate on the high side.
Set up an entirely new MySQL instance for testing, using the values you chose in the configuration file or in your MySQL startup script. Use a realistic workload with data volume similar to your production servers.
Benchmark the performance of I/O intensive workloads.
Periodically increase the value of innodb_undo_logs and re-do the performance tests. Find the value where you stop experiencing gains in I/O performance.
Deploy a new production instance using the ideal settings for these options. Set it up as a slave server in a replication configuration, or transfer data from an earlier production instance.

Performance and Scalability Considerations

Keeping the undo logs in separate files allows the MySQL team to implement I/O and memory optimizations related to this transactional data. For example, because the undo data is written to disk and then rarely used (only in case of crash recovery), it does not need to be kept in the filesystem memory cache, in turn allowing a higher percentage of system memory to be devoted to the InnoDB buffer pool.

The typical SSD best practice of keeping the InnoDB system tablespace on a hard drive and moving the per-table tablespaces to SSD, is assisted by moving the undo information into separate tablespace files.

Internals

The physical tablespace files are named undoN, where N is the space ID, including leading zeros.

Currently, MySQL instances containing separate undo tablespaces cannot be downgraded to earlier releases such as MySQL 5.5 or 5.1.

14.2.4.2.5. Faster Extension for InnoDB Data Files

The benefits of the InnoDB file-per-table setting come with the tradeoff that each .ibd file is extended as the data inside the table grows. This I/O operation can be a bottleneck for busy systems with many InnoDB tables. When all InnoDB tables are stored inside the system tablespace, this extension operation happens less frequently, as space freed by DELETE or TRUNCATE operations within one table can be reused by another table.

MySQL 5.6 improves the concurrency of the extension operation, so that multiple .ibd files can be extended simultaneously, and this operation does not block read or write operations performed by other threads.

14.2.4.2.6. Non-Recursive Deadlock Detection

The code that detects deadlocks in InnoDB transactions has been modified to use a fixed-size work area rather than a recursive algorithm. The resulting detection operation is faster as a result. You do not need to do anything to take advantage of this enhancement.

Under both the old and new detection mechanisms, you might encounter a search too deep error that is not a true deadlock, but requires you to re-try the transaction the same way as with a deadlock.

14.2.4.2.7. Fast CRC32 Checksum Algorithm

You can enable the configuration option innodb_checksum_algorithm=crc32 configuration setting to change the checksum algorithm to a faster one that scans the block 32 bits at a time rather than 8 bits at a time. When the CRC32 algorithm is enabled, data blocks that are written to disk by InnoDB contain different values in their checksum fields than before. This process could be gradual, with a mix of old and new checksum values within the same table or database.

For maximum downward compatibility, this setting is off by default:

Current versions of MySQL Enterprise Backup (up to 3.8.0) do not support backing up tablespaces that use crc32 checksums.
.ibd files containing crc32 checksums could cause problems downgrading to MySQL versions prior to 5.6.3. MySQL 5.6.3 and up recognizes either the new or old checksum values for the block as correct when reading the block from disk, ensuring that data blocks are compatible during upgrade and downgrade regardless of the algorithm setting. If data written with new checksum values is processed by an level of MySQL earlier than 5.6.3, it could be reported as corrupted.

When you set up a new MySQL instance, and can be sure that all the InnoDB data is created using the CRC32 checksum algorithm, you can use the setting innodb_checksum_algorithm=strict_crc32, which can be faster than the crc32 setting because it does not do the extra checksum calculations to support both old and new values.

The innodb_checksum_algorithm option has other values that allow it to replace the innodb_checksums option. innodb_checksum_algorithm=none is the same as innodb_checksums=OFF. innodb_checksum_algorithm=innodb is the same as innodb_checksums=ON. To avoid conflicts, remove references to innodb_checksums from your configuration file and MySQL startup scripts. The new option also accepts values strict_none and strict_innodb, again offering better performance in situations where all InnoDB data in an instance is created with the same checksum algorithm.

The following table illustrates the difference between the none, innodb, and crc32 option values, and their strict_ counterparts. none, innodb, and crc32 write the specified type checksum value into each data block, but for compatibility accept any of the other checksum values when verifying a block during a read operation. The strict_ form of each parameter only recognizes one kind of checksum, which makes verification faster but requires that all InnoDB data files in an instance be created under the identical innodb_checksum_algorithm value.

Table 14.3. Allowed Settings for innodb_checksum_algorithm

Value	Generated checksum (when writing)	Allowed checksums (when reading)
none	A constant number.	Any of the checksums generated by `none`, `innodb`, or `crc32`.
innodb	A checksum calculated in software, using the original algorithm from `InnoDB`.	Any of the checksums generated by `none`, `innodb`, or `crc32`.
crc32	A checksum calculated using the `crc32` algorithm, possibly done with a hardware assist.	Any of the checksums generated by `none`, `innodb`, or `crc32`.
strict_none	A constant number	Only the checksum generated by `none`.
strict_innodb	A checksum calculated in software, using the original algorithm from `InnoDB`.	Only the checksum generated by `innodb`.
strict_crc32	A checksum calculated using the `crc32` algorithm,possibly done with a hardware assist.	Only the checksum generated by `crc32`.

14.2.4.2.8. Faster Restart by Preloading the InnoDB Buffer Pool

After you restart a busy server, there is typically a warmup period with steadily increasing throughput, as disk pages that were in the InnoDB buffer pool are brought back into memory as the same data is queried, updated, and so on. Once the buffer pool holds a similar set of pages as before the restart, many operations are performed in memory rather than involving disk I/O, and throughput stabilizes at a high level.

This feature shortens the warmup period by immediately reloading disk pages that were in the buffer pool before the restart, rather than waiting for DML operations to access the corresponding rows. The I/O requests can be performed in large batches, making the overall I/O faster. The page loading happens in the background, and does not delay the database startup.

In addition to saving the buffer pool state at shutdown and restoring it at startup, you can also save or restore the state at any time. For example, you might save the state of the buffer pool after reaching a stable throughput under a steady workload. You might restore the previous buffer pool state after running reports or maintenance jobs that bring data pages into the buffer pool that are only needed during the time period for those operations, or after some other period with a non-typical workload.

Although the buffer pool itself could be many gigabytes in size, the data that InnoDB saves on disk to restore the buffer pool is tiny by comparison: just the tablespace and page IDs necessary to locate the appropriate pages on disk. This information is derived from the information_schema table innodb_buffer_page_lru.

Because the data is cached in and aged out of the buffer pool the same as with regular database operations, there is no problem if the disk pages were updated recently, or if a DML operation involves data that has not yet been loaded. The loading mechanism skips any requested pages that no longer exist.

This feature involves the configuration variables:

and the status variables:

To save the current state of the InnoDB buffer pool, issue the statement:

SET innodb_buffer_pool_dump_now=ON;

The underlying mechanism involves a background thread that is dispatched to perform the dump and load operations.

By default, the buffer pool state is saved in a file ib_buffer_pool in the InnoDB data directory.

Disk pages from compressed tables are loaded into the buffer pool in their compressed form. Uncompression happens as usual when the page contents are accessed in the course of DML operations. Because decompression is a CPU-intensive process, it is more efficient for concurrency to perform that operation in one of the connection threads rather than the single thread that performs the buffer pool restore operation.

Example 14.1. Examples of Dumping and Restoring the InnoDB Buffer Pool

Trigger a dump of the buffer pool manually:

SET innodb_buffer_pool_dump_now=ON;

Specify that a dump should be taken at shutdown:

SET innodb_buffer_pool_dump_at_shutdown=ON;

Specify that a dump should be loaded at startup:

SET innodb_buffer_pool_load_at_startup=ON;

Trigger a load of the buffer pool manually:

SET innodb_buffer_pool_load_now=ON;

Specify which filename to use for storing the dump to and loading the dump from:

SET innodb_buffer_pool_filename='filename';

Display progress of dump:

SHOW STATUS LIKE 'innodb_buffer_pool_dump_status';

or:

SELECT variable_value FROM information_schema.global_status WHEREvariable_name = 'INNODB_BUFFER_POOL_DUMP_STATUS';

Outputs any of: not started, Dumping buffer pool 5/7, page 237/2873, Finished at 110505 12:18:02

Display progress of load:

SHOW STATUS LIKE 'innodb_buffer_pool_load_status';

or:

SELECT variable_value FROM information_schema.global_status WHEREvariable_name = 'INNODB_BUFFER_POOL_LOAD_STATUS';

Outputs any of: not started, Loaded 123/22301 pages, Finished at 110505 12:23:24

Abort a buffer pool load:

SET innodb_buffer_pool_load_abort=ON;

14.2.4.2.9. Improvements to Buffer Pool Flushing

The new configuration options innodb_flush_neighbors and innodb_lru_scan_depth let you fine-tune certain aspects of the flushing process for the InnoDB buffer pool. These options primarily help write-intensive workloads. With heavy DML activity, flushing can fall behind if it is not aggressive enough, resulting in excessive memory use in the buffer pool; or, disk writes due to flushing can saturate your I/O capacity if that mechanism is too aggressive. The ideal settings depend on your workload, data access patterns, and storage configuration (for example, whether data is stored on HDD or SSD devices).

For systems with constant heavy workloads, or workloads that fluctuate widely, several new configuration options let you fine-tune the flushing behavior for InnoDB tables: innodb_adaptive_flushing_lwm, innodb_max_dirty_pages_pct_lwm, innodb_io_capacity_max, and innodb_flushing_avg_loops. These options feed into an improved formula used by the innodb_adaptive_flushing option.

The existing innodb_adaptive_flushing, innodb_io_capacity and innodb_max_dirty_pages_pct options work as before, except that they are limited or extended by other options: innodb_adaptive_flushing_lwm, innodb_io_capacity_max and innodb_max_dirty_pages_pct_lwm:

The InnoDB adaptive flushing mechanism is not appropriate in all cases. It gives the most benefit when the redo log is in danger of filling up. The innodb_adaptive_flushing_lwm option specifies a percentage of redo log capacity; when that threshold is crossed, InnoDB turns on adaptive flushing even if not specified by the innodb_adaptive_flushing option.
If flushing activity falls far behind, InnoDB can flush more aggressively than specified by innodb_io_capacity. innodb_io_capacity_max represents an upper limit on the I/O capacity used in such emergency situations, so that the spike in I/O does not consume all the capacity of the server.
When the innodb_max_dirty_pages_pct threshold is crossed, InnoDB can begin aggressively flushing pages to disk. The innodb_max_dirty_pages_pct_lwm option specifies a higher value at which InnoDB begins gently flushing pages, ideally preventing the percentage of dirty pages from reaching innodb_max_dirty_pages_pct. A value of innodb_max_dirty_pages_pct_lwm=0 disables this "preflushing" behavior.

All of these options are most applicable for servers running heavy workloads for long periods of time, when there is rarely enough idle time to catch up with changes waiting to be written to disk. The innodb_flushing_avg_loops lets you distinguish between a server that is running at full capacity 24x7 and one that experiences periodic spikes in workload. For a server with a consistently high workload, keep this value high so that the adaptive flushing algorithm responds immediately to changes in the I/O rate. For a server that experiences peaks and troughs in its workload, keep this value low so that InnoDB does not overreact to sudden spikes in DML activity.

14.2.4.2.10. Persistent Optimizer Statistics for InnoDB Tables

Plan stability is a desirable goal for your biggest and most important queries. InnoDB has always computed statistics for each InnoDB table to help the optimizer find the most efficient query execution plan. Now you can make these statistics persistent, so that the index usage and join order for a particular query is less likely to change.

This feature is on by default, enabled by the configuration option innodb_stats_persistent.

You control how much sampling is done to collect the statistics by setting the configuration options innodb_stats_persistent_sample_pages and innodb_stats_transient_sample_pages.

The configuration option innodb_stats_auto_recalc determines whether the statistics are calculated automatically whenever a table undergoes substantial changes (to more than 10% of the rows). If that setting is disabled, ensure the accuracy of optimizer statistics by issuing the ANALYZE TABLE statement for each applicable table after creating an index or making substantial changes to indexed columns. You might run this statement in your setup scripts after representative data has been loaded into the table, and run it periodically after DML operations significantly change the contents of indexed columns, or on a schedule at times of low activity.

Caution

To ensure statistics are gathered when a new index is created, either enable the innodb_stats_auto_recalc option, or run ANALYZE TABLE after creating each new index when the persistent statistics mode is enabled.

You can also set innodb_stats_persistent, innodb_stats_auto_recalc, and innodb_stats_sample_pages options at the session level before creating a table, or use the STATS_PERSISTENT, STATS_AUTO_RECALC, and STATS_SAMPLE_PAGES clauses on the CREATE TABLE and ALTER TABLE statements, to override the system-wide setting and configure persistent statistics for individual tables.

Formerly, these statistics were cleared on each server restart and after some other operations, and recomputed when the table was next accessed. The statistics are computed using a random sampling technique that could produce different estimates the next time, leading to different choices in the execution plan and thus variations in query performance.

To revert to the previous method of collecting statistics that are periodically erased, run the command ALTER TABLE tbl_name STATS_PERSISTENT=0.

The persistent statistics feature relies on the internally managed tables in the mysql database, named innodb_table_stats and innodb_index_stats. These tables are set up automatically in all install, upgrade, and build-from-source procedures.

The innodb_table_stats and innodb_index_stats tables both include a last_update column showing when index statistics were last updated, as shown in the following example:

mysql> select * from INNODB_TABLE_STATS \G*************************** 1. row ***************************           database_name: sakila              table_name: actor             last_update: 2013-05-28 16:16:44                  n_rows: 200    clustered_index_size: 1sum_of_other_index_sizes: 1...

mysql> select * from INNODB_INDEX_STATS \G*************************** 1. row ***************************   database_name: sakila      table_name: actor      index_name: PRIMARY     last_update: 2013-05-28 16:16:44       stat_name: n_diff_pfx01      stat_value: 200     sample_size: 1     ...

If you manually update the statistics in the tables during troubleshooting or tuning, issue the command FLUSH TABLE tbl_name to make MySQL reload the updated statistics.

14.2.4.2.11. Faster Locking for Improved Scalability

In MySQL and InnoDB, multiple threads of execution access shared data structures. InnoDB synchronizes these accesses with its own implementation of mutexes and read/write locks. InnoDB has historically protected the internal state of a read/write lock with an InnoDB mutex. On Unix and Linux platforms, the internal state of an InnoDB mutex is protected by a Pthreads mutex, as in IEEE Std 1003.1c (POSIX.1c).

On many platforms, there is a more efficient way to implement mutexes and read/write locks. Atomic operations can often be used to synchronize the actions of multiple threads more efficiently than Pthreads. Each operation to acquire or release a lock can be done in fewer CPU instructions, and thus result in less wasted time when threads are contending for access to shared data structures. This in turn means greater scalability on multi-core platforms.

InnoDB implements mutexes and read/write locks with the built-in functions provided by the GNU Compiler Collection (GCC) for atomic memory access instead of using the Pthreads approach previously used. More specifically, an InnoDB that is compiled with GCC version 4.1.2 or later uses the atomic builtins instead of a pthread_mutex_t to implement InnoDB mutexes and read/write locks.

On 32-bit Microsoft Windows, InnoDB has implemented mutexes (but not read/write locks) with hand-written assembler instructions. Beginning with Microsoft Windows 2000, functions for Interlocked Variable Access are available that are similar to the built-in functions provided by GCC. On Windows 2000 and higher, InnoDB makes use of the Interlocked functions. Unlike the old hand-written assembler code, the new implementation supports read/write locks and 64-bit platforms.

Solaris 10 introduced library functions for atomic operations, and InnoDB uses these functions by default. When MySQL is compiled on Solaris 10 with a compiler that does not support the built-in functions provided by the GNU Compiler Collection (GCC) for atomic memory access, InnoDB uses the library functions.

This change improves the scalability of InnoDB on multi-core systems. This feature is enabled out-of-the-box on the platforms where it is supported. You do not have to set any parameter or option to take advantage of the improved performance. On platforms where the GCC, Windows, or Solaris functions for atomic memory access are not available, InnoDB uses the traditional Pthreads method of implementing mutexes and read/write locks.

When MySQL starts, InnoDB writes a message to the log file indicating whether atomic memory access is used for mutexes, for mutexes and read/write locks, or neither. If suitable tools are used to build InnoDB and the target CPU supports the atomic operations required, InnoDB uses the built-in functions for mutexing. If, in addition, the compare-and-swap operation can be used on thread identifiers (pthread_t), then InnoDB uses the instructions for read-write locks as well.

Note: If you are building from source, ensure that the build process properly takes advantage of your platform capabilities.

For more information about the performance implications of locking, see Section 8.10, "Optimizing Locking Operations".

14.2.4.2.12. Using Operating System Memory Allocators

When InnoDB was developed, the memory allocators supplied with operating systems and run-time libraries were often lacking in performance and scalability. At that time, there were no memory allocator libraries tuned for multi-core CPUs. Therefore, InnoDB implemented its own memory allocator in the mem subsystem. This allocator is guarded by a single mutex, which may become a bottleneck. InnoDB also implements a wrapper interface around the system allocator (malloc and free) that is likewise guarded by a single mutex.

Today, as multi-core systems have become more widely available, and as operating systems have matured, significant improvements have been made in the memory allocators provided with operating systems. New memory allocators perform better and are more scalable than they were in the past. The leading high-performance memory allocators include Hoard, libumem, mtmalloc, ptmalloc, tbbmalloc, and TCMalloc. Most workloads, especially those where memory is frequently allocated and released (such as multi-table joins), benefit from using a more highly tuned memory allocator as opposed to the internal, InnoDB-specific memory allocator.

You can control whether InnoDB uses its own memory allocator or an allocator of the operating system, by setting the value of the system configuration parameter innodb_use_sys_malloc in the MySQL option file (my.cnf or my.ini). If set to ON or 1 (the default), InnoDB uses the malloc and free functions of the underlying system rather than manage memory pools itself. This parameter is not dynamic, and takes effect only when the system is started. To continue to use the InnoDB memory allocator, set innodb_use_sys_malloc to 0.

Note

When the InnoDB memory allocator is disabled, InnoDB ignores the value of the parameter innodb_additional_mem_pool_size. The InnoDB memory allocator uses an additional memory pool for satisfying allocation requests without having to fall back to the system memory allocator. When the InnoDB memory allocator is disabled, all such allocation requests are fulfilled by the system memory allocator.

On Unix-like systems that use dynamic linking, replacing the memory allocator may be as easy as making the environment variable LD_PRELOAD or LD_LIBRARY_PATH point to the dynamic library that implements the allocator. On other systems, some relinking may be necessary. Please refer to the documentation of the memory allocator library of your choice.

Since InnoDB cannot track all memory use when the system memory allocator is used (innodb_use_sys_malloc is ON), the section "BUFFER POOL AND MEMORY" in the output of the SHOW ENGINE INNODB STATUS command only includes the buffer pool statistics in the "Total memory allocated". Any memory allocated using the mem subsystem or using ut_malloc is excluded.

For more information about the performance implications of InnoDB memory usage, see Section 8.9, "Buffering and Caching".

14.2.4.2.13. Controlling InnoDB Change Buffering

When INSERT, UPDATE, and DELETE operations are done to a table, often the values of indexed columns (particularly the values of secondary keys) are not in sorted order, requiring substantial I/O to bring secondary indexes up to date. InnoDB has an insert buffer that caches changes to secondary index entries when the relevant page is not in the buffer pool, thus avoiding I/O operations by not reading in the page from the disk. The buffered changes are merged when the page is loaded to the buffer pool, and the updated page is later flushed to disk using the normal mechanism. The InnoDB main thread merges buffered changes when the server is nearly idle, and during a slow shutdown.

Because it can result in fewer disk reads and writes, this feature is most valuable for workloads that are I/O-bound, for example applications with a high volume of DML operations such as bulk inserts.

However, the insert buffer occupies a part of the buffer pool, reducing the memory available to cache data pages. If the working set almost fits in the buffer pool, or if your tables have relatively few secondary indexes, it may be useful to disable insert buffering. If the working set entirely fits in the buffer pool, insert buffering does not impose any extra overhead, because it only applies to pages that are not in the buffer pool.

You can control the extent to which InnoDB performs insert buffering with the system configuration parameter innodb_change_buffering. You can turn on and off buffering for inserts, delete operations (when index records are initially marked for deletion) and purge operations (when index records are physically deleted). An update operation is represented as a combination of an insert and a delete. In MySQL 5.5 and higher, the default value is changed from inserts to all.

The allowed values of innodb_change_buffering are:

all

The default value: buffer inserts, delete-marking operations, and purges.
none

Do not buffer any operations.
inserts

Buffer insert operations.
deletes

Buffer delete-marking operations.
changes

Buffer both inserts and delete-marking.
purges

Buffer the physical deletion operations that happen in the background.

You can set the value of this parameter in the MySQL option file (my.cnf or my.ini) or change it dynamically with the SET GLOBAL command, which requires the SUPER privilege. Changing the setting affects the buffering of new operations; the merging of already buffered entries is not affected.

For more information about speeding up INSERT, UPDATE, and DELETE statements, see Section 8.2.2, "Optimizing DML Statements".

14.2.4.2.14. Controlling Adaptive Hash Indexing

If a table fits almost entirely in main memory, the fastest way to perform queries on it is to use hash indexes rather than B-tree lookups. MySQL monitors searches on each index defined for an InnoDB table. If it notices that certain index values are being accessed frequently, it automatically builds an in-memory hash table for that index. See Section 14.2.3.13.6, "Adaptive Hash Indexes" for background information and usage guidelines for the adaptive hash index feature and the innodb_adaptive_hash_index configuration option.

14.2.4.2.15. Changes Regarding Thread Concurrency

InnoDB uses operating system threads to process requests from user transactions. (Transactions may issue many requests to InnoDB before they commit or roll back.) On modern operating systems and servers with multi-core processors, where context switching is efficient, most workloads run well without any limit on the number of concurrent threads. Scalability improvements in MySQL 5.5 and up reduce the need to limit the number of concurrently executing threads inside InnoDB.

In situations where it is helpful to minimize context switching between threads, InnoDB can use a number of techniques to limit the number of concurrently executing operating system threads (and thus the number of requests that are processed at any one time). When InnoDB receives a new request from a user session, if the number of threads concurrently executing is at a pre-defined limit, the new request sleeps for a short time before it tries again. A request that cannot be rescheduled after the sleep is put in a first-in/first-out queue and eventually is processed. Threads waiting for locks are not counted in the number of concurrently executing threads.

You can limit the number of concurrent threads by setting the configuration parameter innodb_thread_concurrency. Once the number of executing threads reaches this limit, additional threads sleep for a number of microseconds, set by the configuration parameter innodb_thread_sleep_delay, before being placed into the queue.

Previously, it required experimentation to find the optimal value for innodb_thread_sleep_delay, and the optimal value could change depending on the workload. In MySQL 5.6.3 and higher, you can set the configuration option innodb_adaptive_max_sleep_delay to the highest value you would allow for innodb_thread_sleep_delay, and InnoDB automatically adjusts innodb_thread_sleep_delay up or down depending on the current thread-scheduling activity. This dynamic adjustment helps the thread scheduling mechanism to work smoothly during times when the system is lightly loaded and when it is operating near full capacity.

The default value for innodb_thread_concurrency and the implied default limit on the number of concurrent threads has been changed in various releases of MySQL and InnoDB. Currently, the default value of innodb_thread_concurrency is 0, so that by default there is no limit on the number of concurrently executing threads, as shown in Table 14.4, "Changes to innodb_thread_concurrency".

Table 14.4. Changes to innodb_thread_concurrency

InnoDB Version	MySQL Version	Default value	Default limit of concurrent threads	Value to allow unlimited threads
Built-in	Earlier than 5.1.11	20	No limit	20 or higher
Built-in	5.1.11 and newer	8	8	0
InnoDB before 1.0.3	(corresponding to Plugin)	8	8	0
InnoDB 1.0.3 and newer	(corresponding to Plugin)	0	No limit	0

Note that InnoDB causes threads to sleep only when the number of concurrent threads is limited. When there is no limit on the number of threads, all contend equally to be scheduled. That is, if innodb_thread_concurrency is 0, the value of innodb_thread_sleep_delay is ignored.

When there is a limit on the number of threads, InnoDB reduces context switching overhead by permitting multiple requests made during the execution of a single SQL statement to enter InnoDB without observing the limit set by innodb_thread_concurrency. Since an SQL statement (such as a join) may comprise multiple row operations within InnoDB, InnoDB assigns "tickets" that allow a thread to be scheduled repeatedly with minimal overhead.

When a new SQL statement starts, a thread has no tickets, and it must observe innodb_thread_concurrency. Once the thread is entitled to enter InnoDB, it is assigned a number of tickets that it can use for subsequently entering InnoDB. If the tickets run out, innodb_thread_concurrency is observed again and further tickets are assigned. The number of tickets to assign is specified by the global option innodb_concurrency_tickets, which is 500 by default. A thread that is waiting for a lock is given one ticket once the lock becomes available.

The correct values of these variables depend on your environment and workload. Try a range of different values to determine what value works for your applications. Before limiting the number of concurrently executing threads, review configuration options that may improve the performance of InnoDB on multi-core and multi-processor computers, such as innodb_use_sys_malloc and innodb_adaptive_hash_index.

For general performance information about MySQL thread handling, see Section 8.11.5.1, "How MySQL Uses Threads for Client Connections".

14.2.4.2.16. Changes in the Read-Ahead Algorithm

A read-ahead request is an I/O request to prefetch multiple pages in the buffer pool asynchronously, in anticipation that these pages will be needed soon. The requests bring in all the pages in one extent. InnoDB uses two read-ahead algorithms to improve I/O performance:

Linear read-ahead is a technique that predicts what pages might be needed soon based on pages in the buffer pool being accessed sequentially. You control when InnoDB performs a read-ahead operation by adjusting the number of sequential page accesses required to trigger an asynchronous read request, using the configuration parameter innodb_read_ahead_threshold. Before this parameter was added, InnoDB would only calculate whether to issue an asynchronous prefetch request for the entire next extent when it read in the last page of the current extent.

The new configuration parameter innodb_read_ahead_threshold controls how sensitive InnoDB is in detecting patterns of sequential page access. If the number of pages read sequentially from an extent is greater than or equal to innodb_read_ahead_threshold, InnoDB initiates an asynchronous read-ahead operation of the entire following extent. It can be set to any value from 0-64. The default value is 56. The higher the value, the more strict the access pattern check. For example, if you set the value to 48, InnoDB triggers a linear read-ahead request only when 48 pages in the current extent have been accessed sequentially. If the value is 8, InnoDB would trigger an asynchronous read-ahead even if as few as 8 pages in the extent were accessed sequentially. You can set the value of this parameter in the MySQL configuration file, or change it dynamically with the SET GLOBAL command, which requires the SUPER privilege.

Random read-ahead is a technique that predicts when pages might be needed soon based on pages already in the buffer pool, regardless of the order in which those pages were read. If 13 consecutive pages from the same extent are found in the buffer pool, InnoDB asynchronously issues a request to prefetch the remaining pages of the extent. This feature was initially turned off in MySQL 5.5. It is available once again starting in MySQL 5.1.59 and 5.5.16 and higher, turned off by default. To enable this feature, set the configuration variable innodb_random_read_ahead.

The SHOW ENGINE INNODB STATUS command displays statistics to help you evaluate the effectiveness of the read-ahead algorithm. With the return of random read-ahead in MySQL 5.6, the SHOW ENGINE INNODB STATUS command once again includes Innodb_buffer_pool_read_ahead_rnd. Innodb_buffer_pool_read_ahead keeps its current name. (In earlier releases, it was listed as Innodb_buffer_pool_read_ahead_seq.) See Section 14.2.5.11, "More Read-Ahead Statistics" for more information.

For more information about I/O performance, see Section 8.5.7, "Optimizing InnoDB Disk I/O" and Section 8.11.3, "Optimizing Disk I/O".

14.2.4.2.17. Multiple Background InnoDB I/O Threads

InnoDB uses background threads to service various types of I/O requests. You can configure the number of background threads that service read and write I/O on data pages, using the configuration parameters innodb_read_io_threads and innodb_write_io_threads. These parameters signify the number of background threads used for read and write requests respectively. They are effective on all supported platforms. You can set the value of these parameters in the MySQL option file (my.cnf or my.ini); you cannot change them dynamically. The default value for these parameters is 4 and the permissible values range from 1-64.

The purpose of this change is to make InnoDB more scalable on high end systems. Each background thread can handle up to 256 pending I/O requests. A major source of background I/O is the read-ahead requests. InnoDB tries to balance the load of incoming requests in such way that most of the background threads share work equally. InnoDB also attempts to allocate read requests from the same extent to the same thread to increase the chances of coalescing the requests together. If you have a high end I/O subsystem and you see more than 64 × innodb_read_io_threads pending read requests in SHOW ENGINE INNODB STATUS, you might gain by increasing the value of innodb_read_io_threads.

For more information about InnoDB I/O performance, see Section 8.5.7, "Optimizing InnoDB Disk I/O".

14.2.4.2.18. Asynchronous I/O on Linux

Starting in InnoDB 1.1 with MySQL 5.5, the asynchronous I/O capability that InnoDB has had on Windows systems is now available on Linux systems. (Other Unix-like systems continue to use synchronous I/O calls.) This feature improves the scalability of heavily I/O-bound systems, which typically show many pending reads/writes in the output of the command SHOW ENGINE INNODB STATUS\G.

Running with a large number of InnoDB I/O threads, and especially running multiple such instances on the same server machine, can exceed capacity limits on Linux systems. In this case, you can fix the error:

EAGAIN: The specified maxevents exceeds the user's limit of available events.

by writing a higher limit to /proc/sys/fs/aio-max-nr.

In general, if a problem with the asynchronous I/O subsystem in the OS prevents InnoDB from starting, set the option innodb_use_native_aio=0 in the configuration file. This new configuration option applies to Linux systems only, and cannot be changed once the server is running.

For more information about InnoDB I/O performance, see Section 8.5.7, "Optimizing InnoDB Disk I/O".

14.2.4.2.19. Group Commit

InnoDB, like any other ACID-compliant database engine, flushes the redo log of a transaction before it is committed. Historically, InnoDB used group commit functionality to group multiple such flush requests together to avoid one flush for each commit. With group commit, InnoDB issues a single write to the log file to perform the commit action for multiple user transactions that commit at about the same time, significantly improving throughput.

Group commit in InnoDB worked until MySQL 4.x, and works once again with MySQL 5.1 with the InnoDB Plugin, and MySQL 5.5 and higher. The introduction of support for the distributed transactions and Two Phase Commit (2PC) in MySQL 5.0 interfered with the InnoDB group commit functionality. This issue is now resolved.

The group commit functionality inside InnoDB works with the Two Phase Commit protocol in MySQL. Re-enabling of the group commit functionality fully ensures that the ordering of commit in the MySQL binlog and the InnoDB logfile is the same as it was before. It means it is totally safe to use the MySQL Enterprise Backup product with InnoDB 1.0.4 (that is, the InnoDB Plugin with MySQL 5.1) and above. When the binlog is enabled, you typically also set the configuration option sync_binlog=0, because group commit for the binary log is only supported if it is set to 0.

Group commit is transparent; you do not need to do anything to take advantage of this significant performance improvement.

For more information about performance of COMMIT and other transactional operations, see Section 8.5.2, "Optimizing InnoDB Transaction Management".

14.2.4.2.20. Controlling the InnoDB Master Thread I/O Rate

The master thread in InnoDB is a thread that performs various tasks in the background. Most of these tasks are I/O related, such as flushing dirty pages from the buffer pool or writing changes from the insert buffer to the appropriate secondary indexes. The master thread attempts to perform these tasks in a way that does not adversely affect the normal working of the server. It tries to estimate the free I/O bandwidth available and tune its activities to take advantage of this free capacity. Historically, InnoDB has used a hard coded value of 100 IOPs (input/output operations per second) as the total I/O capacity of the server.

The parameter innodb_io_capacity indicates the overall I/O capacity available to InnoDB, per buffer pool instance. This parameter should be set to approximately the number of I/O operations that the system can perform per second. The value depends on your system configuration. When innodb_io_capacity is set, the master threads estimates the I/O bandwidth available for background tasks based on the set value. Setting the value to 100 reverts to the old behavior.

You can set the value of innodb_io_capacity to any number 100 or greater. The default value is 200, reflecting that the performance of typical modern I/O devices is higher than in the early days of MySQL. Typically, values around the previous default of 100 are appropriate for consumer-level storage devices, such as hard drives up to 7200 RPMs. Faster hard drives, RAID configurations, and SSDs benefit from higher values.

You can set the value of this parameter in the MySQL option file (my.cnf or my.ini) or change it dynamically with the SET GLOBAL command, which requires the SUPER privilege.

Formerly, the InnoDB master thread also performed any needed purge operations. In MySQL 5.6.5 and higher, those I/O operations are moved to other background threads, whose number is controlled by the innodb_purge_threads configuration option.

For more information about InnoDB I/O performance, see Section 8.5.7, "Optimizing InnoDB Disk I/O".

14.2.4.2.21. Controlling the Flushing Rate of Dirty Pages from the InnoDB Buffer Pool

InnoDB performs certain tasks in the background, including flushing of dirty pages (those pages that have been changed but are not yet written to the database files) from the buffer pool, a task performed by the master thread. Currently, InnoDB aggressively flushes buffer pool pages if the percentage of dirty pages in the buffer pool exceeds innodb_max_dirty_pages_pct.

InnoDB uses a new algorithm to estimate the required rate of flushing, based on the speed of redo log generation and the current rate of flushing. The intent is to smooth overall performance by ensuring that buffer flush activity keeps up with the need to keep the buffer pool "clean". Automatically adjusting the rate of flushing can help to avoid sudden dips in throughput, when excessive buffer pool flushing limits the I/O capacity available for ordinary read and write activity.

InnoDB uses its log files in a circular fashion. Before reusing a portion of a log file, InnoDB flushes to disk all dirty buffer pool pages whose redo entries are contained in that portion of the log file, a process known as a sharp checkpoint. If a workload is write-intensive, it generates a lot of redo information, all written to the log file. If all available space in the log files is used up, a sharp checkpoint occurs, causing a temporary reduction in throughput. This situation can happen even though innodb_max_dirty_pages_pct is not reached.

InnoDB uses a heuristic-based algorithm to avoid such a scenario, by measuring the number of dirty pages in the buffer pool and the rate at which redo is being generated. Based on these numbers, InnoDB decides how many dirty pages to flush from the buffer pool each second. This self-adapting algorithm is able to deal with sudden changes in the workload.

Internal benchmarking has also shown that this algorithm not only maintains throughput over time, but can also improve overall throughput significantly.

Because adaptive flushing is a new feature that can significantly affect the I/O pattern of a workload, a new configuration parameter lets you turn off this feature. The default value of the boolean parameter innodb_adaptive_flushing is TRUE, enabling the new algorithm. You can set the value of this parameter in the MySQL option file (my.cnf or my.ini) or change it dynamically with the SET GLOBAL command, which requires the SUPER privilege.

For more information about InnoDB I/O performance, see Section 8.5.7, "Optimizing InnoDB Disk I/O".

14.2.4.2.22. Using the PAUSE Instruction in InnoDB Spin Loops

Synchronization inside InnoDB frequently involves the use of spin loops: while waiting, InnoDB executes a tight loop of instructions repeatedly to avoid having the InnoDB process and threads be rescheduled by the operating system. If the spin loops are executed too quickly, system resources are wasted, imposing a performance penalty on transaction throughput. Most modern processors implement the PAUSE instruction for use in spin loops, so the processor can be more efficient.

InnoDB uses a PAUSE instruction in its spin loops on all platforms where such an instruction is available. This technique increases overall performance with CPU-bound workloads, and has the added benefit of minimizing power consumption during the execution of the spin loops.

You do not have to do anything to take advantage of this performance improvement.

For performance considerations for InnoDB locking operations, see Section 8.10, "Optimizing Locking Operations".

14.2.4.2.23. Control of Spin Lock Polling

Many InnoDB mutexes and rw-locks are reserved for a short time. On a multi-core system, it can be more efficient for a thread to continuously check if it can acquire a mutex or rw-lock for a while before sleeping. If the mutex or rw-lock becomes available during this polling period, the thread can continue immediately, in the same time slice. However, too-frequent polling by multiple threads of a shared object can cause "cache ping pong", different processors invalidating portions of each others' cache. InnoDB minimizes this issue by waiting a random time between subsequent polls. The delay is implemented as a busy loop.

You can control the maximum delay between testing a mutex or rw-lock using the parameter innodb_spin_wait_delay. The duration of the delay loop depends on the C compiler and the target processor. (In the 100MHz Pentium era, the unit of delay was one microsecond.) On a system where all processor cores share a fast cache memory, you might reduce the maximum delay or disable the busy loop altogether by setting innodb_spin_wait_delay=0. On a system with multiple processor chips, the effect of cache invalidation can be more significant and you might increase the maximum delay.

The default value of innodb_spin_wait_delay is 6. The spin wait delay is a dynamic global parameter that you can specify in the MySQL option file (my.cnf or my.ini) or change at runtime with the command SET GLOBAL innodb_spin_wait_delay=delay, where delay is the desired maximum delay. Changing the setting requires the SUPER privilege.

For performance considerations for InnoDB locking operations, see Section 8.10, "Optimizing Locking Operations".

14.2.4.2.24. Making the Buffer Pool Scan Resistant

Rather than using a strictly LRU algorithm, InnoDB uses a technique to minimize the amount of data that is brought into the buffer pool and never accessed again. The goal is to make sure that frequently accessed ("hot") pages remain in the buffer pool, even as read-ahead and full table scans bring in new blocks that might or might not be accessed afterward.

Newly read blocks are inserted into the middle of the list representing the buffer pool. of the LRU list. All newly read pages are inserted at a location that by default is 3/8 from the tail of the LRU list. The pages are moved to the front of the list (the most-recently used end) when they are accessed in the buffer pool for the first time. Thus pages that are never accessed never make it to the front portion of the LRU list, and "age out" sooner than with a strict LRU approach. This arrangement divides the LRU list into two segments, where the pages downstream of the insertion point are considered "old" and are desirable victims for LRU eviction.

For an explanation of the inner workings of the InnoDB buffer pool and the specifics of its LRU replacement algorithm, see Section 8.9.1, "The InnoDB Buffer Pool".

You can control the insertion point in the LRU list, and choose whether InnoDB applies the same optimization to blocks brought into the buffer pool by table or index scans. The configuration parameter innodb_old_blocks_pct controls the percentage of "old" blocks in the LRU list. The default value of innodb_old_blocks_pct is 37, corresponding to the original fixed ratio of 3/8. The value range is 5 (new pages in the buffer pool age out very quickly) to 95 (only 5% of the buffer pool is reserved for hot pages, making the algorithm close to the familiar LRU strategy).

The optimization that keeps the buffer pool from being churned by read-ahead can avoid similar problems due to table or index scans. In these scans, a data page is typically accessed a few times in quick succession and is never touched again. The configuration parameter innodb_old_blocks_time specifies the time window (in milliseconds) after the first access to a page during which it can be accessed without being moved to the front (most-recently used end) of the LRU list. The default value of innodb_old_blocks_time is 0, corresponding to the original behavior of moving a page to the most-recently used end of the buffer pool list when it is first accessed in the buffer pool. Increasing this value makes more and more blocks likely to age out faster from the buffer pool.

Both innodb_old_blocks_pct and innodb_old_blocks_time are dynamic, global and can be specified in the MySQL option file (my.cnf or my.ini) or changed at runtime with the SET GLOBAL command. Changing the setting requires the SUPER privilege.

To help you gauge the effect of setting these parameters, the SHOW ENGINE INNODB STATUS command reports additional statistics. The BUFFER POOL AND MEMORY section looks like:

Total memory allocated 1107296256; in additional pool allocated 0Dictionary memory allocated 80360Buffer pool size   65535Free buffers       0Database pages     63920Old database pages 23600Modified db pages  34969Pending reads 32Pending writes: LRU 0, flush list 0, single page 0Pages made young 414946, not young 29306731274.75 youngs/s, 16521.90 non-youngs/sPages read 486005, created 3178, written 1605852132.37 reads/s, 3.40 creates/s, 323.74 writes/sBuffer pool hit rate 950 / 1000, young-making rate 30 / 1000 not 392 / 1000Pages read ahead 1510.10/s, evicted without access 0.00/sLRU len: 63920, unzip_LRU len: 0I/O sum[43690]:cur[221], unzip sum[0]:cur[0]

Old database pages is the number of pages in the "old" segment of the LRU list.
Pages made young and not young is the total number of "old" pages that have been made young or not respectively.
youngs/s and non-young/s is the rate at which page accesses to the "old" pages have resulted in making such pages young or otherwise respectively since the last invocation of the command.
young-making rate and not provides the same rate but in terms of overall buffer pool accesses instead of accesses just to the "old" pages.

Note

Per second averages provided in InnoDB Monitor output are based on the elapsed time between the current time and the last time InnoDB Monitor output was printed.

Because the effects of these parameters can vary widely based on your hardware configuration, your data, and the details of your workload, always benchmark to verify the effectiveness before changing these settings in any performance-critical or production environment.

In mixed workloads where most of the activity is OLTP type with periodic batch reporting queries which result in large scans, setting the value of innodb_old_blocks_time during the batch runs can help keep the working set of the normal workload in the buffer pool.

When scanning large tables that cannot fit entirely in the buffer pool, setting innodb_old_blocks_pct to a small value keeps the data that is only read once from consuming a significant portion of the buffer pool. For example, setting innodb_old_blocks_pct=5 restricts this data that is only read once to 5% of the buffer pool.

When scanning small tables that do fit into memory, there is less overhead for moving pages around within the buffer pool, so you can leave innodb_old_blocks_pct at its default value, or even higher, such as innodb_old_blocks_pct=50.

The effect of the innodb_old_blocks_time parameter is harder to predict than the innodb_old_blocks_pct parameter, is relatively small, and varies more with the workload. To arrive at an optimal value, conduct your own benchmarks if the performance improvement from adjusting innodb_old_blocks_pct is not sufficient.

For more information about the InnoDB buffer pool, see Section 8.9.1, "The InnoDB Buffer Pool".

14.2.4.2.25. Improvements to Crash Recovery Performance

A number of optimizations speed up certain steps of the recovery that happens on the next startup after a crash. In particular, scanning the redo log and applying the redo log are faster than in MySQL 5.1 and earlier, due to improved algorithms for memory management. You do not need to take any actions to take advantage of this performance enhancement. If you kept the size of your redo log files artificially low because recovery took a long time, you can consider increasing the file size.

For more information about InnoDB recovery, see Section 14.2.3.14, "The InnoDB Recovery Process".

14.2.4.2.26. Integration with the MySQL Performance Schema

Starting with InnoDB 1.1 with MySQL 5.5, you can profile certain internal InnoDB operations using the MySQL Performance Schema feature. This type of tuning is primarily for expert users, those who push the limits of MySQL performance, read the MySQL source code, and evaluate optimization strategies to overcome performance bottlenecks. DBAs can also use this feature for capacity planning, to see whether their typical workload encounters any performance bottlenecks with a particular combination of CPU, RAM, and disk storage; and if so, to judge whether performance can be improved by increasing the capacity of some part of the system.

To use this feature to examine InnoDB performance:

You must be running MySQL 5.5 or higher. You must build the database server from source, enabling the Performance Schema feature by building with the --with-perfschema option. Since the Performance Schema feature introduces some performance overhead, you should use it on a test or development system rather than on a production system.
You must be running InnoDB 1.1 or higher.
You must be generally familiar with how to use the Performance Schema feature, for example to query tables in the performance_schema database.
Examine the following kinds of InnoDB objects by querying the appropriate performance_schema tables. The items associated with InnoDB all contain the substring innodb in the EVENT_NAME column.

For the definitions of the *_instances tables, see Section 20.9.2, "Performance Schema Instance Tables". For the definitions of the *_summary_* tables, see Section 20.9.8, "Performance Schema Summary Tables". For the definition of the thread table, see Section 20.9.9, "Performance Schema Miscellaneous Tables". For the definition of the *_current_* and *_history_* tables, see Section 20.9.3, "Performance Schema Wait Event Tables".
- Mutexes in the mutex_instances table. (Mutexes and RW-locks related to the InnoDB buffer pool are not included in this coverage; the same applies to the output of the SHOW ENGINE INNODB MUTEX command.)
- RW-locks in the rwlock_instances table.
- RW-locks in the rwlock_instances table.
- File I/O operations in the file_instances, file_summary_by_event_name, and file_summary_by_instance tables.
- Threads in the PROCESSLIST table.
During performance testing, examine the performance data in the events_waits_current and events_waits_history_long tables. If you are interested especially in InnoDB-related objects, use the clause WHERE EVENT_NAME LIKE '%innodb%' to see just those entries; otherwise, examine the performance statistics for the overall MySQL server.
You must be running MySQL 5.5, with the Performance Schema enabled by building with the --with-perfschema build option.

For more information about the MySQL Performance Schema, see Chapter 20, MySQL Performance Schema.

14.2.4.2.27. Improvements to Performance from Multiple Buffer Pools

This performance enhancement is primarily useful for people with a large buffer pool size, typically in the multi-gigabyte range. To take advantage of this speedup, you must set the new innodb_buffer_pool_instances configuration option, and you might also adjust the innodb_buffer_pool_size value.

When the InnoDB buffer pool is large, many data requests can be satisfied by retrieving from memory. You might encounter bottlenecks from multiple threads trying to access the buffer pool at once. Starting in InnoDB 1.1 and MySQL 5.5, you can enable multiple buffer pools to minimize this contention. Each page that is stored in or read from the buffer pool is assigned to one of the buffer pools randomly, using a hashing function. Each buffer pool manages its own free lists, flush lists, LRUs, and all other data structures connected to a buffer pool, and is protected by its own buffer pool mutex.

To enable this feature, set the innodb_buffer_pool_instances configuration option to a value greater than 1 (the default) up to 64 (the maximum). This option takes effect only when you set the innodb_buffer_pool_size to a size of 1 gigabyte or more. The total size you specify is divided among all the buffer pools. For best efficiency, specify a combination of innodb_buffer_pool_instances and innodb_buffer_pool_size so that each buffer pool instance is at least 1 gigabyte.

For more information about the InnoDB buffer pool, see Section 8.9.1, "The InnoDB Buffer Pool".

14.2.4.2.28. Better Scalability with Multiple Rollback Segments

Starting in InnoDB 1.1 with MySQL 5.5, the limit on concurrent transactions was greatly expanded, removing a bottleneck with the InnoDB rollback segment that affected high-capacity systems. The limit applies to concurrent transactions that change any data; read-only transactions do not count against that maximum.

The single rollback segment was divided into 128 segments. As of MySQL 5.7.2, 32 of the 128 segments are reserved for temporary table transactions. This leaves 96 segments, each of which can support up to 1023 transactions that perform writes, for a total of approximately 96K concurrent transactions. The original transaction limit prior to InnoDB 1.1 with MySQL 5.5 was 1023.

Each transaction is assigned to one of the rollback segments, and remains tied to that rollback segment for the duration. This enhancement improves both scalability (higher number of concurrent transactions) and performance (less contention when different transactions access the rollback segments).

To take advantage of this feature, you do not need to create any new database or tables, or reconfigure anything. You must do a slow shutdown before upgrading from MySQL 5.1 or earlier, or some time afterward. InnoDB makes the required changes inside the system tablespace automatically, the first time you restart after performing a slow shutdown.

If your workload was not constrained by the original limit of 1023 concurrent transactions, you can reduce the number of rollback segments used within a MySQL instance or within a session by setting the configuration option innodb_rollback_segments.

For more information about performance of InnoDB under high transactional load, see Section 8.5.2, "Optimizing InnoDB Transaction Management".

14.2.4.2.29. Better Scalability with Improved Purge Scheduling

The purge operations (a type of garbage collection) that InnoDB performs automatically is now done in one or more separate threads, rather than as part of the master thread. This change improves scalability, because the main database operations run independently from maintenance work happening in the background.

To control this feature, increase the value of the configuration option innodb_purge_threads=n. If DML action is concentrated on a single table or a few tables, keep the setting low so that the threads do not contend with each other for access to the busy tables. If DML operations are spread across many tables, increase the setting. Its maximum is 32.

There is another related configuration option, innodb_purge_batch_size with a default of 20 and maximum of 5000. This option is mainly intended for experimentation and tuning of purge operations, and should not be interesting to typical users.

For more information about InnoDB I/O performance, see Section 8.5.7, "Optimizing InnoDB Disk I/O".

14.2.4.2.30. Improved Log Sys Mutex

This is another performance improvement that comes for free, with no user action or configuration needed. The details here are intended for performance experts who delve into the InnoDB source code, or interpret reports with keywords such as "mutex" and "log_sys".

The mutex known as the log sys mutex has historically done double duty, controlling access to internal data structures related to log records and the LSN, as well as pages in the buffer pool that are changed when a mini-transaction is committed. Starting in InnoDB 1.1 with MySQL 5.5, these two kinds of operations are protected by separate mutexes, with a new log_buf mutex controlling writes to buffer pool pages due to mini-transactions.

For performance considerations for InnoDB locking operations, see Section 8.10, "Optimizing Locking Operations".

14.2.4.2.31. Separate Flush List Mutex

Starting with InnoDB 1.1 with MySQL 5.5, concurrent access to the buffer pool is faster. Operations involving the flush list, a data structure related to the buffer pool, are now controlled by a separate mutex and do not block access to the buffer pool. You do not need to configure anything to take advantage of this speedup; it is fully automatic.

For more information about the InnoDB buffer pool, see Section 8.9.1, "The InnoDB Buffer Pool".

14.2.4.2.32. Kernel Mutex Split

The mutex controlling concurrent access to the InnoDB kernel is now divided into separate mutexes and rw-locks to reduce contention. You do not need to configure anything to take advantage of this speedup; it is fully automatic.

14.2.4.2.33. InnoDB Configurable Data Dictionary Cache

To ease the memory load on systems with huge numbers of tables, InnoDB now frees up the memory associated with an opened table, using an LRU algorithm to select tables that have gone the longest without being accessed. To reserve more memory to hold metadata for open InnoDB tables, increase the value of the table_definition_cache configuration option. InnoDB treats this value as a "soft limit". The actual number of tables with cached metadata could be higher, because metadata for InnoDB system tables, and parent and child tables in foreign key relationships, is never evicted from memory.

14.2.4.2.34. Improved Tablespace Management

Several new features extend the file-per-table mode enabled by the innodb_file_per_table configuration option, allowing more flexibility in how the .ibd files are placed, exported, and restored. We characterize this as a performance enhancement because it solves the common customer request to put data from different tables onto different storage devices, for best price/performance depending on the access patterns of the data. For example, tables with high levels of random reads and writes might be placed on an SSD device, while less-often-accessed data or data processed with large batches of sequential I/O might be placed on an HDD device. See Section 5.4.1, "Managing InnoDB Tablespaces" for details.

14.2.4.2.35. memcached Plugin for InnoDB

The memcached daemon is frequently used as an in-memory caching layer in front of a MySQL database server. Now MySQL allows direct access to InnoDB tables using the familiar memcached protocol and client libraries. Instead of formulating queries in SQL, you can perform simple get, set, and increment operations that avoid the performance overhead of SQL parsing and constructing a query optimization plan. You can also access the underlying InnoDB tables through SQL to load data, generate reports, or perform multi-step transactional computations.

This technique allows the data to be stored in MySQL for reliability and consistency, while coding application logic that uses the database as a fast key-value store.

This feature combines the best of both worlds:

Data that is written using the memcached protocol is transparently written to an InnoDB table, without going through the MySQL SQL layer. You can control the frequency of writes to achieve higher raw performance when updating non-critical data.
Data that is requested data through the memcached protocol is transparently queried from an InnoDB table, without going through the MySQL SQL layer.
Subsequent requests for the same data will be served from the InnoDB buffer pool. The buffer pool handles the in-memory caching. You can tune the performance of data-intensive operations using the familiar InnoDB configuration options.
InnoDB can handle composing and decomposing multiple column values into a single memcached item value, reducing the amount of string parsing and concatenation required in your application. For example, you might store a string value 2|4|6|8 in the memcached cache, and InnoDB splits that value based on a separator character, then stores the result into four numeric columns.

For details on using this NoSQL-style interface to MySQL, see Section 14.2.9, "InnoDB Integration with memcached". For additional background on memcached and considerations for writing applications for its API, see Section 15.6, "Using MySQL with memcached".

14.2.4.2.36. Online DDL

This feature is a continuation of the "Fast Index Creation" feature introduced in Fast Index Creation in the InnoDB Storage Engine. Now you can perform other kinds of DDL operations on InnoDB tables online: that is, with minimal delay for operations on that table, without rebuilding the entire table, or both. This enhancement improves responsiveness and availability in busy production environments, where making a table unavailable for minutes or hours whenever its column definitions change is not practical.

For full details, see Section 5.5, "Online DDL for InnoDB Tables".

The DDL operations enhanced by this feature are these variations on the ALTER TABLE statement:

Create secondary indexes: CREATE INDEX name ON table (col_list) or ALTER TABLE table ADD INDEX name (col_list). (Creating a primary key or a FULLTEXT index still requires locking the table.)

Drop secondary indexes: DROP INDEX name ON table; or ALTER TABLE table DROP INDEX name

Creating and dropping secondary indexes on InnoDB tables has avoided the table-copying behavior since the days of MySQL 5.1 with the InnoDB Plugin. Now, the table remains available for read and write operations while the index is being created or dropped. The CREATE TABLE or DROP TABLE statement only finishes after all transactions that are modifying the table are completed, so that the initial state of the index reflects the most recent contents of the table.

Previously, modifying the table while an index was being created or dropped typically resulted in a deadlock that cancelled the insert, update, or delete statement on the table.
Changing the auto-increment value for a column: ALTER TABLE table AUTO_INCREMENT=next_value;

Especially in a distributed system using replication or sharding, you sometimes reset the auto-increment counter for a table to a specific value. The next row inserted into the table uses the specified value for its auto-increment column. You might also use this technique in a data warehousing environment where you periodically empty all the tables and reload them, and you can restart the auto-increment sequence from 1.
Adding or dropping a foreign key constraint:
```
ALTER TABLE tbl1 ADD CONSTRAINT fk_name FOREIGN KEY index (col1) REFERENCES tbl2(col2) referential_actions;ALTER TABLE tbl DROP FOREIGN KEY fk_name;
```
Dropping a foreign key can be performed online with the foreign_key_checks option enabled or disabled. Creating a foreign key online requires foreign_key_checks to be disabled.

If you do not know the names of the foreign key constraints on a particular table, issue the following statement and find the constraint name in the CONSTRAINT clause for each foreign key:
```
show create table table\G
```
Or, query the information_schema.table_constraints table and use the constraint_name and constraint_type columns to identify the foreign key names.

As a consequence of this enhancement, you can now also drop a foreign key and its associated index in a single statement, which previously required separate statements in a strict order:
```
ALTER TABLE table DROP FOREIGN KEY constraint, DROP INDEX index;
```
Renaming a column: ALTER TABLE tbl CHANGE old_col_name new_col_name datatype

When you keep the same data type and only change the column name, this operation can always be performed online. As part of this enhancement, you can now rename a column that is part of a foreign key constraint, which was not allowed before.
Some other ALTER TABLE operations are non-blocking, and are faster than before because the table-copying operation is optimized, even though a table copy is still required:
- Changing the ROW_FORMAT or KEY_BLOCK_SIZE properties for a table.
- Changing the nullable status for a column.
- Adding, dropping, or reordering columns.

Note

As your database schema evolves with new columns, data types, constraints, indexes, and so on, keep your CREATE TABLE statements up to date with the latest table definitions. Even with the performance improvements of online DDL, it is more efficient to create stable database structures at the beginning, rather than creating part of the schema and then issuing ALTER TABLE statements afterward.

The main exception to this guideline is for secondary indexes on tables with large numbers of rows. It is typically most efficient to create the table with all details specified except the secondary indexes, load the data, then create the secondary indexes.

Whatever sequence of CREATE TABLE, CREATE INDEX, ALTER TABLE, and similar statements went into putting a table together, you can capture the SQL needed to reconstruct the current form of the table by issuing the statement SHOW CREATE TABLE table\G (uppercase \G required for tidy formatting). This output shows clauses such as numeric precision, NOT NULL, and CHARACTER SET that are sometimes added behind the scenes, and you might otherwise leave out when cloning the table on a new system or setting up foreign key columns with identical type.

14.2.4.2.37. DDL Optimization for Temporary Tables

DDL performance for InnoDB temporary tables is improved through optimization of CREATE TABLE, DROP TABLE, TRUNCATE TABLE, and ALTER TABLE statements. Optimizations were achieved by limiting actions performed by DDL statements to only those necessary for temporary tables.

14.2.4.2.38. Improved Memory Allocation for Transaction Instances

Memory for transaction instances (trx_t) is now allocated in configurable sized blocks that are a multiple of transaction instance size. Transaction instances are also placed in a priority queue and ordered by their address in memory so that when instances are allocated from the pool, they are close together. This enhancement reduces the cost incurred by iterating over transactions instances when allocating instances from the pool.