Spec-Zone .ru
спецификации, руководства, описания, API
|
When using a replication setup involving multiple masters (including circular replication), it is possible that different masters may try to update the same row on the slave with different data. Conflict resolution in MySQL Cluster Replication provides a means of resolving such conflicts by permitting a user-defined resolution column to be used to determine whether or not an update on a given master should be applied on the slave.
Some types of conflict resolution supported by MySQL Cluster (NDB$OLD()
, NDB$MAX()
, NDB$MAX_DELETE_WIN()
) implement this
user-defined column as a "timestamp" column (although its
type cannot be TIMESTAMP
,
as explained later in this section). These types of conflict resolution are always applied a row-by-row basis
rather than a transactional basis. The epoch-based conflict resolution functions NDB$EPOCH()
and NDB$EPOCH_TRANS()
compare the order in which epochs are replicated (and thus
these functions are transactional). Different methods can be used to compare resolution column values on the
slave when conflicts occur, as explained later in this section; the method used can be set on a per-table basis.
You should also keep in mind that it is the application's responsibility to ensure that the resolution column is correctly populated with relevant values, so that the resolution function can make the appropriate choice when determining whether to apply an update.
Requirements. Preparations for conflict resolution must be made on both the master and the slave. These tasks are described in the following list:
On the master writing the binary logs, you must determine which columns are sent
(all columns or only those that have been updated). This is done for the MySQL Server as a whole by
applying the mysqld startup option --ndb-log-updated-only
(described later in this section) or on a
per-table basis by entries in the mysql.ndb_replication
table (see The ndb_replication
system
table).
If you are replicating tables with very large columns (such as TEXT
or BLOB
columns), --ndb-log-updated-only
can also be useful for reducing the size of
the master and slave binary logs and avoiding possible replication failures due to exceeding max_allowed_packet
.
See Section 16.4.1.20,
"Replication and max_allowed_packet
", for more information
about this issue.
On the slave, you must determine which type of conflict resolution to apply ("latest timestamp wins", "same timestamp wins", "primary wins",
"primary wins, complete transaction", or
none). This is done using the mysql.ndb_replication
system table, on a
per-table basis (see The ndb_replication
system table).
When using the functions NDB$OLD()
, NDB$MAX()
, and
NDB$MAX_DELETE_WIN()
for timestamp-based conflict resolution, we often refer to the
column used for determining updates as a "timestamp"
column. However, the data type of this column is never TIMESTAMP
; instead, its data type should be INT
(INTEGER
) or BIGINT
. The "timestamp"
column should also be UNSIGNED
and NOT NULL
.
The NDB$EPOCH()
and NDB$EPOCH_TRANS()
functions
discussed later in this section work by comparing the relative order of replication epochs applied on a primary
and secondary MySQL Cluster, and do not make use of timestamps.
Master column control. We can see update operations in terms of "before"
and "after" images—that is, the states of the table before
and after the update is applied. Normally, when updating a table with a primary key, the "before" image is not of great interest; however, when
we need to determine on a per-update basis whether or not to use the updated values on a replication slave, we
need to make sure that both images are written to the master's binary log. This is done with the --ndb-log-update-as-write
option for mysqld, as described later in this section.
Whether logging of complete rows or of updated columns only is done is decided when the MySQL server is started, and cannot be changed online; you must either restart mysqld, or start a new mysqld instance with different logging options.
--ndb-log-updated-only
Option)Command-Line Format | --ndb-log-updated-only |
||
Option-File Format | ndb_log_updated_only |
||
System Variable Name | ndb_log_updated_only |
||
Variable Scope | Global | ||
Dynamic Variable | Yes | ||
Permitted Values | |||
Type | boolean |
||
Default | ON |
For purposes of conflict resolution, there are two basic
methods of logging rows, as determined by the setting of the --ndb-log-updated-only
option for mysqld:
Log complete rows
Log only column data that has been updated—that is, column data whose value has been set, regardless of whether or not this value was actually changed. This is the default behavior.
It is usually sufficient—and more efficient—to log updated columns only; however, if you need to log full rows,
you can do so by setting --ndb-log-updated-only
to 0
or OFF
.
--ndb-log-update-as-write
Option: Logging Changed Data as UpdatesCommand-Line Format | --ndb-log-update-as-write |
||
Option-File Format | ndb-log-update-as-write |
||
System Variable Name | ndb_log_update_as_write |
||
Variable Scope | Global | ||
Dynamic Variable | Yes | ||
Permitted Values | |||
Type | boolean |
||
Default | ON |
The setting of the MySQL Server's --ndb-log-update-as-write
option determines whether logging is performed with or
without the "before" image. Because conflict resolution is
done in the MySQL Server's update handler, it is necessary to control logging on the master such that updates
are updates and not writes; that is, such that updates are treated as changes in existing rows rather than the
writing of new rows (even though these replace existing rows). This option is turned on by default; in other
words, updates are treated as writes. (That is, updates are by default written as write_row
events in the binary log, rather than as update_row
events.) To turn off the
option, start the master mysqld with --ndb-log-update-as-write=0
or --ndb-log-update-as-write=OFF
.
Conflict resolution control. Conflict resolution is usually enabled on the server where conflicts can
occur. Like logging method selection, it is enabled by entries in the mysql.ndb_replication
table.
The ndb_replication
system
table. To
enable conflict resolution, it is necessary to create an ndb_replication
table in
the mysql
system database on the master, the slave, or both, depending on the
conflict resolution type and method to be employed. This table is used to control logging and conflict
resolution functions on a per-table basis, and has one row per table involved in replication. ndb_replication
is created and filled with control information on the server
where the conflict is to be resolved. In a simple master-slave setup where data can also be changed locally on
the slave this will typically be the slave. In a more complex master-master (2-way) replication schema this will
usually be all of the masters involved. Each row in mysql.ndb_replication
corresponds to a table being replicated, and specifies how to log and resolve conflicts (that is, which conflict
resolution function, if any, to use) for that table. The definition of the mysql.ndb_replication
table is shown here:
CREATE TABLE mysql.ndb_replication ( db VARBINARY(63), table_name VARBINARY(63), server_id INT UNSIGNED, binlog_type INT UNSIGNED, conflict_fn VARBINARY(128), PRIMARY KEY USING HASH (db, table_name, server_id)) ENGINE=NDBPARTITION BY KEY(db,table_name);
The columns in this table are described in the next few paragraphs.
db
. The name of the database containing the table to be replicated.
table_name
. The name of the table to be replicated.
server_id
. The unique server ID of the MySQL instance (SQL node) where the
table resides.
binlog_type
. The type of binary logging to be employed. This is determined
as shown in the following table:
Value | Internal Value | Description |
---|---|---|
0 | NBT_DEFAULT |
Use server default |
1 | NBT_NO_LOGGING |
Do not log this table in the binary log |
2 | NBT_UPDATED_ONLY |
Only updated attributes are logged |
3 | NBT_FULL |
Log full row, even if not updated (MySQL server default behavior) |
4 | NBT_USE_UPDATE |
(For generating NBT_UPDATED_ONLY_USE_UPDATE and NBT_FULL_USE_UPDATE
values only—not intended for separate use)
|
5 | [Not used] | --- |
6 | NBT_UPDATED_ONLY_USE_UPDATE (equal to NBT_UPDATED_ONLY
| NBT_USE_UPDATE )
|
Use updated attributes, even if values are unchanged |
7 | NBT_FULL_USE_UPDATE (equal to NBT_FULL
|NBT_USE_UPDATE )
|
Use full row, even if values are unchanged |
conflict_fn
. The conflict resolution function to be applied. This function
must be specified as one of those shown in the following list:
These functions are described in the next few paragraphs.
NDB$OLD(
).
If the value of column_name
column_name
is the same on both the master and the slave, then the update is applied; otherwise, the update is not applied
on the slave and an exception is written to the log. This is illustrated by the following pseudocode:
if (master_old_column_value
==slave_current_column_value
) apply_update();else log_exception();
This function can be used for "same value wins" conflict resolution. This type of conflict resolution ensures that updates are not applied on the slave from the wrong master.
The column value from the master's "before" image is used by this function.
NDB$MAX(
).
If the "timestamp"
column value for a given row coming from the master is higher than that on the slave, it is applied; otherwise
it is not applied on the slave. This is illustrated by the following pseudocode: column_name
if (master_new_column_value
>slave_current_column_value
) apply_update();
This function can be used for "greatest timestamp wins" conflict resolution. This type of conflict resolution ensures that, in the event of a conflict, the version of the row that was most recently updated is the version that persists.
The column value from the master's "after" image is used by this function.
NDB$MAX_DELETE_WIN(
). This is a
variation on column_name
NDB$MAX()
. Due to the fact that no timestamp is available for a delete
operation, a delete using NDB$MAX()
is in fact processed as NDB$OLD
.
However, for some use cases, this is not optimal. For NDB$MAX_DELETE_WIN()
, if the
"timestamp" column value for a given row adding or
updating an existing row coming from the master is higher than that on the slave, it is applied. However, delete
operations are treated as always having the higher value. This is illustrated in the following pseudocode:
if ( (master_new_column_value
>slave_current_column_value
) ||operation.type
== "delete") apply_update();
This function can be used for "greatest timestamp, delete wins" conflict resolution. This type of conflict resolution ensures that, in the event of a conflict, the version of the row that was deleted or (otherwise) most recently updated is the version that persists.
As with NDB$MAX()
, the column value from the master's "after" image is the value used by this function.
NDB$EPOCH()
. The NDB$EPOCH()
function tracks the order in which replicated epochs are applied on a
slave MySQL Cluster relative to changes originating on the slave. This relative ordering is used to determine
whether changes originating on the slave are concurrent with any changes that originate locally, and are
therefore potentially in conflict.
Most of what follows in the description of NDB$EPOCH()
also applies to NDB$EPOCH_TRANS()
. Any exceptions are noted in the text.
NDB$EPOCH()
is asymmetric, operating on one MySQL Cluster in a two-cluster circular
replication configuration (sometimes referred to as "active-active" replication). We refer here to cluster on which it operates
as the primary, and the other as the secondary. The slave on the primary is responsible for detecting and
handling conflicts, while the slave on the secondaryis not involved in any conflict detection or handling.
When the slave on the primary detects conflicts, it injects events into its own binary log to compensate for these; this ensures that the secondary MySQL Cluster eventually realigns itself with the primary and so keeps the primary and secondary from diverging. This compensation and realignment mechanism requires that the primary MySQL Cluster always wins any conflicts with the secondary—that is, that the primary's changes are always used rather than those from the secondary in event of a conflict. This "primary always wins" rule has the following implications:
Operations that change data, once committed on the primary, are fully persistent and will not be undone or rolled back by conflict detection and resolution.
Data read from the primary is fully consistent. Any changes committed on the Primary (locally or from the slave) will not be reverted later.
Operations that change data on the secondary may later be reverted if the primary determines that they are in conflict.
Individual rows read on the secondary are self-consistent at all times, each row always reflecting either a state committed by the secondary, or one committed by the primary.
Sets of rows read on the secondary may not necessarily be consistent at a given
single point in time. For NDB$EPOCH_TRANS()
, this is a transient state; for
NDB$EPOCH()
, it can be a persistent state.
Assuming a period of sufficient length without any conflicts, all data on the secondary MySQL Cluster (eventually) becomes consistent with the primary's data.
NDB$EPOCH()
and NDB$EPOCH_TRANS()
do not require any
user schema modifications, or application changes to provide conflict detection. However, careful thought must
be given to the schema used, and the access patterns used, to verify that the complete system behaves within
specified limits.
Each of the NDB$EPOCH()
and NDB$EPOCH_TRANS()
functions can take an optional parameter; this is the number of bits to use to represent the lower 32 bits of
the epoch, and should be set to no less than
CEIL( LOG2(TimeBetweenGlobalCheckpoints
/TimeBetweenEpochs
), 1)
For the default values of these configuration parameters (2000 and 100 milliseconds, respectively), this gives a
value of 5 bits, so the default value (6) should be sufficient, unless other values are used for TimeBetweenGlobalCheckpoints
, TimeBetweenEpochs
, or both. A value that is too small can result in false
positives, while one that is too large could lead to excessive wasted space in the database.
Both NDB$EPOCH()
and NDB$EPOCH_TRANS()
insert entries
for conflicting rows into the relevant exceptions tables, provided that these tables have been defined according
to the same exception table schema rules as described elsewhere in this section (see NDB$OLD(
)). Note that
you need to create any exception table before creating the table with which it is to be used. column_name
As with the other conflict detection functions discussed in this section, NDB$EPOCH()
and NDB$EPOCH_TRANS()
are activated by
including relevant entries in the mysql.ndb_replication
table (see The
ndb_replication
system table). The roles of the primary and secondary MySQL
Clusters in this scenario are fully determined by mysql.ndb_replication
table
entries.
Because the conflict detection algorithms employed by NDB$EPOCH()
and
NDB$EPOCH_TRANS()
are asymmetric, you must use different values for the primary
slave's and secondary slave's server_id
entries.
NDB$EPOCH()
and NDB$EPOCH_TRANS()
status variables. Several status variables can be used to
monitor NDB$EPOCH()
and NDB$EPOCH_TRANS()
conflict
detection. You can see how many rows have been found in conflict by NDB$EPOCH()
since this slave was last restarted from the current value of the Ndb_conflict_fn_epoch
system status variable.
Ndb_conflict_fn_epoch_trans
provides the number of rows that have been found directly in conflict by NDB$EPOCH_TRANS()
; the number of rows actually realigned, including those
affected due to their membership in or dependency on the same transactions as other conflicting rows, is given
by Ndb_conflict_trans_row_reject_count
.
For more information, see Section 17.3.4.4, "MySQL Cluster Status Variables".
Limitations on NDB$EPOCH()
. The following limitations currently apply when using NDB$EPOCH()
to perform conflict detection:
Conflicts are detected using MySQL Cluster epoch boundaries, with granularity
proportional to TimeBetweenEpochs
(default: 100 milliseconds). The minimum conflict
window is the minimum time during which concurrent updates to the same data on both clusters always
report a conflict. This is always a nonzero length of time, and is roughly proportional to 2 * (latency + queueing + TimeBetweenEpochs)
. This implies that—assuming
the default for TimeBetweenEpochs
and ignoring any latency between clusters (as well
as any queuing delays)—the minimum conflict window size is approximately 200 milliseconds. This minimum
window should be considered when looking at expected application "race" patterns.
Additional storage is required for tables using the NDB$EPOCH()
and NDB$EPOCH_TRANS()
functions; from 1 to 32 bits extra space per row is
required, depending on the value passed to the function.
Conflicts between delete operations may result in divergence between the primary and secondary. When a row is deleted on both clusters concurrently, the conflict can be detected, but is not recorded, since the row is deleted. This means that further conflicts during the propagation of any subsequent realignment operations will not be detected, which can lead to divergence.
Deletes should be externally serialized, or routed to one cluster only. Alternatively, a separate row should be updated transactionally with such deletes and any inserts that follow them, so that conflicts can be tracked across row deletes. This may require changes in applications.
Only two MySQL Clusters in a circular "active-active"
configuration are currently supported when using NDB$EPOCH()
or NDB$EPOCH_TRANS()
for conflict detection.
Tables having BLOB
or TEXT
columns are not currently supported with NDB$EPOCH()
or NDB$EPOCH_TRANS()
.
NDB$EPOCH_TRANS()
.
NDB$EPOCH_TRANS()
extends the NDB$EPOCH()
function.
Conflicts are detected and handled in the same way using the "primary wins all"
rule (see NDB$EPOCH()
) but with the extra condition that
any other rows updated in the same transaction in which the conflict occurred are also regarded as being in
conflict. In other words, where NDB$EPOCH()
realigns individual conflicting rows on
the secondary, NDB$EPOCH_TRANS()
realigns conflicting transactions.
In addition, any transactions which are detectably dependent on a conflicting transaction are also regarded as being in conflict, these dependencies being determined by the contents of the secondary cluster's binary log. Since the binary log contains only data modification operations (inserts, updates, and deletes), only overlapping data modifications are used to determine dependencies between transactions.
NDB$EPOCH_TRANS()
is subject to the same conditions and limitations as NDB$EPOCH()
, and in addition requires that Version 2 binary log row events are used (--log-bin-use-v1-row-events
equal to 0), which adds a storage overhead of 2 bytes per event in the binary log. In addition, all transaction
IDs must be recorded in the secondary's binary log (--ndb-log-transaction-id
option), which adds a further variable overhead (up
to 13 bytes per row).
See NDB$EPOCH()
.
NULL
. Indicates that conflict resolution is not to be used for the
corresponding table.
Status information. A server status variable Ndb_conflict_fn_max
provides a count of the number of times that a row was not
applied on the current SQL node due to "greatest timestamp wins"
conflict resolution since the last time that mysqld was started.
The number of times that a row was not applied as the result of "same timestamp wins"
conflict resolution on a given mysqld since the last time it was restarted is given by the
global status variable Ndb_conflict_fn_old
. In addition to incrementing Ndb_conflict_fn_old
, the primary key of the row that was not used is inserted
into an exceptions table, as explained later in this section.
Exceptions table. To use the NDB$OLD()
conflict resolution function, it is
also necessary to create an exceptions table corresponding to each NDB
table for which this type of conflict resolution is to be employed. This
is also true when using NDB$EPOCH()
or NDB$EPOCH_TRANS()
. The name of this table is that of the table for which conflict
resolution is to be applied, with the string $EX
appended. (For example, if the
name of the original table is mytable
, the name of the corresponding exception
table name should be mytable$EX
.) This table is created as follows:
CREATE TABLEoriginal_table
$EX ( server_id INT UNSIGNED, master_server_id INT UNSIGNED, master_epoch BIGINT UNSIGNED, count INT UNSIGNED,original_table_pk_columns
, [additional_columns
,] PRIMARY KEY(server_id, master_server_id, master_epoch, count)) ENGINE=NDB;
The first four columns are required. Following these columns, the columns making up the original table's primary key should be copied in the order in which they are used to define the primary key of the original table.
The names of the first four columns and the columns matching the original table's primary key
columns are not critical; however, we suggest for reasons of clarity and consistency, that you use the names
shown here for the server_id
,
master_server_id
, master_epoch
, and count
columns, and that you use the same names as in the original table for
the columns matching those in the original table's primary key.
The data types for the columns duplicating the primary key columns of the original table should be the same as for (or larger than) the original columns.
Additional columns may optionally be defined following these columns, but not before any of them; any such extra
columns cannot be NOT NULL
. The exception table's primary key must be defined as
shown. The exception table must use the NDB
storage engine. An example that uses NDB$OLD()
with an exceptions table is shown later in this section.
The mysql.ndb_replication
table is read when a data table is set up
for replication, so the row corresponding to a table to be replicated must be inserted into mysql.ndb_replication
before the table
to be replicated is created.
Examples. The following examples assume that you have already a working MySQL Cluster replication setup, as described in Section 17.6.5, "Preparing the MySQL Cluster for Replication", and Section 17.6.6, "Starting MySQL Cluster Replication (Single Replication Channel)".
NDB$MAX()
example. Suppose you wish to enable
"greatest timestamp wins" conflict
resolution on table test.t1
, using column mycol
as the "timestamp". This can be done using the following steps:
Make sure that you have started the master mysqld with --ndb-log-update-as-write=OFF
.
On the master, perform this INSERT
statement:
INSERT INTO mysql.ndb_replication VALUES ('test', 't1', 0, NULL, 'NDB$MAX(mycol)');
Inserting a 0 into the server_id
indicates that all SQL nodes accessing this
table should use conflict resolution. If you want to use conflict resolution on a
specific mysqld
only, use the actual server ID.
Inserting NULL
into the binlog_type
column has the same effect as inserting 0 (NBT_DEFAULT
); the server default is used.
Create the test.t1
table:
CREATE TABLE test.t1 (columns
mycol INT UNSIGNED,columns
) ENGINE=NDB;
Now, when updates are done on this table, conflict resolution is applied, and the
version of the row having the greatest value for mycol
is
written to the slave.
Other binlog_type
options—such as NBT_UPDATED_ONLY_USE_UPDATE
should be used to control logging on the master using the ndb_replication
table rather than by using command-line options.
NDB$OLD()
example. Suppose an NDB
table such as the one defined here is being replicated, and you wish
to enable "same timestamp wins" conflict
resolution for updates to this table:
CREATE TABLE test.t2 ( a INT UNSIGNED NOT NULL, b CHAR(25) NOT NULL,columns
, mycol INT UNSIGNED NOT NULL,columns
, PRIMARY KEY pk (a, b)) ENGINE=NDB;
The following steps are required, in the order shown:
First—and prior to creating
test.t2
—you must insert a row into the mysql.ndb_replication
table, as shown here:
INSERT INTO mysql.ndb_replication VALUES ('test', 't2', 0, NULL, 'NDB$OLD(mycol)');
Possible values for the binlog_type
column are shown
earlier in this section. The value 'NDB$OLD(mycol)'
should
be inserted into the conflict_fn
column.
Create an appropriate exceptions table for test.t2
.
The table creation statement shown here includes all required columns; any additional
columns must be declared following these columns, and before the definition of the table's
primary key.
CREATE TABLE test.t2$EX ( server_id SMALLINT UNSIGNED, master_server_id INT UNSIGNED, master_epoch BIGINT UNSIGNED, count BIGINT UNSIGNED, a INT UNSIGNED NOT NULL, b CHAR(25) NOT NULL, [additional_columns
,] PRIMARY KEY(server_id, master_server_id, master_epoch, count)) ENGINE=NDB;
Create the table test.t2
as shown
previously.
These steps must be followed for every table for which you wish to perform conflict resolution using
NDB$OLD()
. For each such table, there must be a corresponding row in
mysql.ndb_replication
, and there must be an exceptions table in the
same database as the table being replicated.