5.5.2. Performance and Concurrency Considerations for Online DDL

Online DDL improves several aspects of MySQL operation, such as performance, concurrency, availability, and scalability:

Because queries and DML operations on the table can proceed while the DDL is in progress, applications that access the table are more responsive. Reduced locking and waiting for other resources all throughout the MySQL server leads to greater scalability, even for operations not involving the table being altered.
For in-place operations, by avoiding the disk I/O and CPU cycles to rebuild the table, you minimize the overall load on the database and maintain good performance and high throughput during the DDL operation.
For in-place operations, because less data is read into the buffer pool than if all the data was copied, you avoid purging frequently accessed data from memory, which formerly could cause a temporary performance dip after a DDL operation.

Locking Options for Online DDL

While an InnoDB table is being changed by a DDL operation, the table may or may not be locked, depending on the internal workings of that operation and the LOCK clause of the ALTER TABLE statement. By default, MySQL uses as little locking as possible during a DDL operation; you specify the clause either to make the locking more restrictive than it normally would be (thus limiting concurrent DML, or DML and queries), or to ensure that some expected degree of locking is allowed for an operation. If the LOCK clause specifies a level of locking that is not available for that specific kind of DDL operation, such as LOCK=SHARED or LOCK=NONE while creating or dropping a primary key, the clause works like an assertion, causing the statement to fail with an error. The following list shows the different possibilities for the LOCK clause, from the most permissive to the most restrictive:

For DDL operations with LOCK=NONE, both queries and concurrent DML are allowed. This clause makes the ALTER TABLE fail if the kind of DDL operation cannot be performed with the requested type of locking, so specify LOCK=NONE if keeping the table fully available is vital and it is OK to cancel the DDL if that is not possible. For example, you might use this clause in DDLs for tables involving customer signups or purchases, to avoid making those tables unavailable by mistakenly issuing an expensive ALTER TABLE statement.
For DDL operations with LOCK=SHARED, any writes to the table (that is, DML operations) are blocked, but the data in the table can be read. This clause makes the ALTER TABLE fail if the kind of DDL operation cannot be performed with the requested type of locking, so specify LOCK=SHARED if keeping the table available for queries is vital and it is OK to cancel the DDL if that is not possible. For example, you might use this clause in DDLs for tables in a data warehouse, where it is OK to delay data load operations until the DDL is finished, but queries cannot be delayed for long periods.
For DDL operations with LOCK=DEFAULT, or with the LOCK clause omitted, MySQL uses the lowest level of locking that is available for that kind of operation, allowing concurrent queries, DML, or both wherever possible. This is the setting to use when making pre-planned, pre-tested changes that you know will not cause any availability problems based on the workload for that table.
For DDL operations with LOCK=EXCLUSIVE, both queries and DML operations are blocked. This clause makes the ALTER TABLE fail if the kind of DDL operation cannot be performed with the requested type of locking, so specify LOCK=EXCLUSIVE if the primary concern is finishing the DDL in the shortest time possible, and it is OK to make applications wait when they try to access the table. You might also use LOCK=EXCLUSIVE if the server is supposed to be idle, to avoid unexpected accesses to the table.

An online DDL statement for an InnoDB table always waits for currently executing transactions that are accessing the table to commit or roll back, because it requires exclusive access to the table for a brief period while the DDL statement is being prepared. Likewise, it requires exclusive access to the table for a brief time before finishing. Thus, an online DDL statement waits for any transactions that are started while the DDL is in progress, and query or modify the table, to commit or roll back before the DDL completes.

Because there is some processing work involved with recording the changes made by concurrent DML operations, then applying those changes at the end, an online DDL operation could take longer overall than the old-style mechanism that blocks table access from other sessions. The reduction in raw performance is balanced against better responsiveness for applications that use the table. When evaluating the ideal techniques for changing table structure, consider end-user perception of performance, based on factors such as load times for web pages.

A newly created InnoDB secondary index contains only the committed data in the table at the time the CREATE INDEX or ALTER TABLE statement finishes executing. It does not contain any uncommitted values, old versions of values, or values marked for deletion but not yet removed from the old index.

Performance of In-Place versus Table-Copying DDL Operations

The raw performance of an online DDL operation is largely determined by whether the operation is performed in-place, or requires copying and rebuilding the entire table. See Table 5.9, "Summary of Online Status for DDL Operations" to see what kinds of operations can be performed in-place, and any requirements for avoiding table-copy operations.

The performance speedup from in-place DDL applies to operations on secondary indexes, not to the primary key index. The rows of an InnoDB table are stored in a clustered index organized based on the primary key, forming what some database systems call an "index-organized table". Because the table structure is so closely tied to the primary key, redefining the primary key still requires copying the data.

When an operation on the primary key uses ALGORITHM=INPLACE, even though the data is still copied, it is more efficient than using ALGORITHM=COPY because:

No undo logging or associated redo logging is required for ALGORITHM=INPLACE. These operations add overhead to DDL statements that use ALGORITHM=COPY.
The secondary index entries are pre-sorted, and so can be loaded in order.
The change buffer is not used, because there are no random-access inserts into the secondary indexes.

To judge the relative performance of online DDL operations, you can run such operations on a big InnoDB table using current and earlier versions of MySQL. You can also run all the performance tests under the latest MySQL version, simulating the previous DDL behavior for the "before" results, by setting the old_alter_table system variable. Issue the statement set old_alter_table=1 in the session, and measure DDL performance to record the "before" figures. Then set old_alter_table=0 to re-enable the newer, faster behavior, and run the DDL operations again to record the "after" figures.

For a basic idea of whether a DDL operation does its changes in-place or performs a table copy, look at the "rows affected" value displayed after the command finishes. For example, here are lines you might see after doing different types of DDL operations:

Changing the default value of a column (super-fast, does not affect the table data at all):
```
Query OK, 0 rows affected (0.07 sec)
```
Adding an index (takes time, but 0 rows affected shows that the table is not copied):
```
Query OK, 0 rows affected (21.42 sec)
```
Changing the data type of a column (takes substantial time and does require rebuilding all the rows of the table):
```
Query OK, 1671168 rows affected (1 min 35.54 sec)
```

For example, before running a DDL operation on a big table, you might check whether the operation will be fast or slow as follows:

Clone the table structure.
Populate the cloned table with a tiny amount of data.
Run the DDL operation on the cloned table.
Check whether the "rows affected" value is zero or not. A non-zero value means the operation will require rebuilding the entire table, which might require special planning. For example, you might do the DDL operation during a period of scheduled downtime, or on each replication slave server one at a time.

For a deeper understanding of the reduction in MySQL processing, examine the performance_schema and INFORMATION_SCHEMA tables related to InnoDB before and after DDL operations, to see the number of physical reads, writes, memory allocations, and so on.