Spec-Zone .ru
спецификации, руководства, описания, API
|
Online DDL improves several aspects of MySQL operation, such as performance, concurrency, availability, and scalability:
Because queries and DML operations on the table can proceed while the DDL is in progress, applications that access the table are more responsive. Reduced locking and waiting for other resources all throughout the MySQL server leads to greater scalability, even for operations not involving the table being altered.
For in-place operations, by avoiding the disk I/O and CPU cycles to rebuild the table, you minimize the overall load on the database and maintain good performance and high throughput during the DDL operation.
For in-place operations, because less data is read into the buffer pool than if all the data was copied, you avoid purging frequently accessed data from memory, which formerly could cause a temporary performance dip after a DDL operation.
While an InnoDB table is being changed by a DDL operation, the table may or may not be locked,
depending on the internal workings of that operation and the LOCK
clause of the ALTER TABLE
statement. By default, MySQL uses as little locking as possible
during a DDL operation; you specify the clause either to make the locking more restrictive than it normally
would be (thus limiting concurrent DML, or DML and queries), or to ensure that some expected degree of locking
is allowed for an operation. If the LOCK
clause specifies a level of locking that
is not available for that specific kind of DDL operation, such as LOCK=SHARED
or
LOCK=NONE
while creating or dropping a primary key, the clause works like an
assertion, causing the statement to fail with an error. The following list shows the different possibilities for
the LOCK
clause, from the most permissive to the most restrictive:
For DDL operations with LOCK=NONE
, both queries and
concurrent DML are allowed. This clause makes the ALTER TABLE
fail if the kind of DDL operation cannot be performed
with the requested type of locking, so specify LOCK=NONE
if keeping the
table fully available is vital and it is OK to cancel the DDL if that is not possible. For example, you
might use this clause in DDLs for tables involving customer signups or purchases, to avoid making those
tables unavailable by mistakenly issuing an expensive ALTER TABLE
statement.
For DDL operations with LOCK=SHARED
, any writes to the
table (that is, DML operations) are blocked, but the data in the table can be read. This clause makes
the ALTER TABLE
fail if the kind of DDL operation cannot be performed
with the requested type of locking, so specify LOCK=SHARED
if keeping the
table available for queries is vital and it is OK to cancel the DDL if that is not possible. For
example, you might use this clause in DDLs for tables in a data warehouse, where it is OK to delay data
load operations until the DDL is finished, but queries cannot be delayed for long periods.
For DDL operations with LOCK=DEFAULT
, or with the
LOCK
clause omitted, MySQL uses the lowest level of locking that is
available for that kind of operation, allowing concurrent queries, DML, or both wherever possible. This
is the setting to use when making pre-planned, pre-tested changes that you know will not cause any
availability problems based on the workload for that table.
For DDL operations with LOCK=EXCLUSIVE
, both queries
and DML operations are blocked. This clause makes the ALTER TABLE
fail if the kind of DDL operation cannot be performed
with the requested type of locking, so specify LOCK=EXCLUSIVE
if the
primary concern is finishing the DDL in the shortest time possible, and it is OK to make applications
wait when they try to access the table. You might also use LOCK=EXCLUSIVE
if the server is supposed to be idle, to avoid unexpected accesses to the table.
An online DDL statement for an InnoDB table always waits for currently executing transactions that are accessing the table to commit or roll back, because it requires exclusive access to the table for a brief period while the DDL statement is being prepared. Likewise, it requires exclusive access to the table for a brief time before finishing. Thus, an online DDL statement waits for any transactions that are started while the DDL is in progress, and query or modify the table, to commit or roll back before the DDL completes.
Because there is some processing work involved with recording the changes made by concurrent DML operations, then applying those changes at the end, an online DDL operation could take longer overall than the old-style mechanism that blocks table access from other sessions. The reduction in raw performance is balanced against better responsiveness for applications that use the table. When evaluating the ideal techniques for changing table structure, consider end-user perception of performance, based on factors such as load times for web pages.
A newly created InnoDB secondary index contains only the committed data in the table at the time the CREATE
INDEX
or ALTER TABLE
statement finishes executing. It does not contain any uncommitted
values, old versions of values, or values marked for deletion but not yet removed from the old index.
The raw performance of an online DDL operation is largely determined by whether the operation is performed in-place, or requires copying and rebuilding the entire table. See Table 5.9, "Summary of Online Status for DDL Operations" to see what kinds of operations can be performed in-place, and any requirements for avoiding table-copy operations.
The performance speedup from in-place DDL applies to operations on secondary indexes, not to the primary key index. The rows of an InnoDB table are stored in a clustered index organized based on the primary key, forming what some database systems call an "index-organized table". Because the table structure is so closely tied to the primary key, redefining the primary key still requires copying the data.
When an operation on the primary key uses ALGORITHM=INPLACE
, even though the data
is still copied, it is more efficient than using ALGORITHM=COPY
because:
No undo logging or associated redo logging is required for ALGORITHM=INPLACE
.
These operations add overhead to DDL statements that use ALGORITHM=COPY
.
The secondary index entries are pre-sorted, and so can be loaded in order.
The change buffer is not used, because there are no random-access inserts into the secondary indexes.
To judge the relative performance of online DDL operations, you can run such operations on a big InnoDB
table using current and earlier versions of MySQL. You can also run all the
performance tests under the latest MySQL version, simulating the previous DDL behavior for the "before" results, by setting the old_alter_table
system variable. Issue the statement set
old_alter_table=1
in the session, and measure DDL performance to record the "before" figures. Then set
old_alter_table=0
to re-enable the newer, faster behavior, and run the DDL operations again to record
the "after" figures.
For a basic idea of whether a DDL operation does its changes in-place or performs a table copy, look at the "rows affected" value displayed after the command finishes. For example, here are lines you might see after doing different types of DDL operations:
Changing the default value of a column (super-fast, does not affect the table data at all):
Query OK, 0 rows affected (0.07 sec)
Adding an index (takes time, but 0 rows affected
shows
that the table is not copied):
Query OK, 0 rows affected (21.42 sec)
Changing the data type of a column (takes substantial time and does require rebuilding all the rows of the table):
Query OK, 1671168 rows affected (1 min 35.54 sec)
For example, before running a DDL operation on a big table, you might check whether the operation will be fast or slow as follows:
Clone the table structure.
Populate the cloned table with a tiny amount of data.
Run the DDL operation on the cloned table.
Check whether the "rows affected" value is zero or not. A non-zero value means the operation will require rebuilding the entire table, which might require special planning. For example, you might do the DDL operation during a period of scheduled downtime, or on each replication slave server one at a time.
For a deeper understanding of the reduction in MySQL processing, examine the performance_schema
and INFORMATION_SCHEMA
tables related to InnoDB
before
and after DDL operations, to see the number of physical reads, writes, memory allocations, and so on.