Spec-Zone .ru
спецификации, руководства, описания, API
|
The [ndbd]
and [ndbd default]
sections are used to
configure the behavior of the cluster's data nodes.
[ndbd]
and [ndbd default]
are always used as the
section names whether you are using ndbd or ndbmtd binaries for the data node processes.
There are many parameters which control buffer sizes, pool sizes, timeouts, and so forth. The only mandatory
parameter is either one of ExecuteOnComputer
or HostName
; this must be defined in the local [ndbd]
section.
The parameter NoOfReplicas
should be defined in the [ndbd default]
section, as it is common to all Cluster
data nodes. It is not strictly necessary to set NoOfReplicas
, but it is good practice to set it explicitly.
Most data node parameters are set in the [ndbd default]
section. Only those
parameters explicitly stated as being able to set local values are permitted to be changed in the [ndbd]
section. Where present, HostName
, NodeId
and ExecuteOnComputer
must
be defined in the local [ndbd]
section, and not in any other section of config.ini
. In other words, settings for these parameters are specific to one
data node.
For those parameters affecting memory usage or buffer sizes, it is possible to use K
, M
, or G
as a suffix to
indicate units of 1024, 1024×1024, or 1024×1024×1024. (For example, 100K
means 100
× 1024 = 102400.) Parameter names and values are currently case-sensitive.
Information about configuration parameters specific to MySQL Cluster Disk Data tables can be found later in this section (see Disk Data Configuration Parameters).
All of these parameters also apply to ndbmtd (the multi-threaded version of ndbd). Three additional data node configuration parameters—MaxNoOfExecutionThreads
,
ThreadConfig
,
and NoOfFragmentLogParts
—apply
to ndbmtd only; these have no effect when used with ndbd. For more information, see Multi-Threading
Configuration Parameters (ndbmtd). See also Section 17.4.3, "ndbmtd — The MySQL Cluster Data Node Daemon
(Multi-Threaded)".
Identifying data nodes. The NodeId
or Id
value
(that is, the data node identifier) can be allocated on the command line when the node is started or in the
configuration file.
A unique node ID is used as the node's address for all cluster internal messages. For data nodes, this is an integer in the range 1 to 48 inclusive. Each node in the cluster must have a unique identifier.
NodeId
is the preferred parameter name to use when identifying data
nodes. Although the older Id
is still supported for backward compatibility, it is now
deprecated, and generates a warning when used. Id
is also subject to
removal in a future MySQL Cluster release.
A unique node ID is used as the node's address for all cluster internal messages. For data nodes, this is an integer in the range 1 to 48 inclusive. Each node in the cluster must have a unique identifier.
NodeId
is the preferred parameter name to use when identifying data
nodes. Although Id
continues to be supported for backward compatibility, it is
now deprecated, generates a warning when used, and is subject to removal in a future version of
MySQL Cluster.
This refers to the Id
set for one of the computers defined in a [computer]
section.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | name or IP address | localhost | ... |
Restart Type: S |
Specifying this parameter defines the hostname of the computer on which the data node is to reside.
To specify a hostname other than localhost
, either this parameter or
ExecuteOnComputer
is required.
Each node in the cluster uses a port to connect to other nodes. By default, this port is allocated dynamically in such a way as to ensure that no two nodes on the same host computer receive the same port number, so it should normally not be necessary to specify a value for this parameter.
However, if you need to be able to open specific ports in a firewall to permit communication between
data nodes and API nodes (including SQL nodes), you can set this parameter to the number of the
desired port in an [ndbd]
section or (if you need to do this for
multiple data nodes) the [ndbd default]
section of the config.ini
file, and then open the port having that number for
incoming connections from SQL nodes, API nodes, or both.
Connections from data nodes to management nodes is done using the ndb_mgmd management port (the management
server's PortNumber
; see Section
17.3.2.5, "Defining a MySQL Cluster Management Server") so outgoing connections to that
port from any data nodes should always be permitted.
Setting this parameter to TRUE
or 1
binds
IP_ADDR_ANY
so that connections can be made from anywhere (for
autogenerated connections). The default is FALSE
(0
).
This parameter can be used to assign a data node to a specific node group. It is read only when the
cluster is started for the first time, and cannot be used to reassign a data node to a different
node group online. It is generally not desirable to use this parameter in the [ndbd
default]
section of the config.ini
file, and care must be
taken not to assign nodes to node groups in such a way that an invalid numbers of nodes are assigned
to any node groups.
The NodeGroup
parameter is chiefly intended for use in adding a new node group to a running MySQL Cluster without
having to perform a rolling restart. For this purpose, you should set it to 65536 (the maximum
value). You are not required to set a NodeGroup
value for all cluster data nodes, only for those nodes
which are to be started and added to the cluster as a new node group at a later time. For more
information, see Section
17.5.13.3, "Adding MySQL Cluster Data Nodes Online: Detailed Example".
This global parameter can be set only in the [ndbd default]
section,
and defines the number of replicas for each table stored in the cluster. This parameter also
specifies the size of node groups. A node group is a set of nodes all storing the same information.
Node groups are formed implicitly. The first node group is formed by the set of data nodes with the
lowest node IDs, the next node group by the set of the next lowest node identities, and so on. By
way of example, assume that we have 4 data nodes and that NoOfReplicas
is set to 2. The four data nodes have node IDs 2, 3,
4 and 5. Then the first node group is formed from nodes 2 and 3, and the second node group by nodes
4 and 5. It is important to configure the cluster in such a manner that nodes in the same node
groups are not placed on the same computer because a single hardware failure would cause the entire
cluster to fail.
If no node IDs are provided, the order of the data nodes will be the determining factor for the node
group. Whether or not explicit assignments are made, they can be viewed in the output of the
management client's SHOW
command.
The default value for NoOfReplicas
is 2, which is the recommended setting in most common
usage scenarios.
The maximum possible value is 4; currently, only the values 1 and 2 are actually supported.
Setting NoOfReplicas
to 1 means that there is only a single copy of all
Cluster data; in this case, the loss of a single data node causes the cluster to fail because
there are no additional copies of the data stored by that node.
The value for this parameter must divide evenly into the number of data nodes in the cluster. For
example, if there are two data nodes, then NoOfReplicas
must be equal to either 1 or 2, since 2/3 and 2/4
both yield fractional values; if there are four data nodes, then NoOfReplicas
must be equal to 1, 2, or 4.
This parameter specifies the directory where trace files, log files, pid files and error logs are placed.
The default is the data node process working directory.
This parameter specifies the directory where all files created for metadata, REDO logs, UNDO logs
(for Disk Data tables), and data files are placed. The default is the directory specified by DataDir
.
This directory must exist before the ndbd process is initiated.
The recommended directory hierarchy for MySQL Cluster includes /var/lib/mysql-cluster
,
under which a directory for the node's file system is created. The name of this subdirectory
contains the node ID. For example, if the node ID is 2, this subdirectory is named ndb_2_fs
.
This parameter specifies the directory in which backups are placed.
The string '/BACKUP
' is always appended to this value. For
example, if you set the value of BackupDataDir
to /var/lib/cluster-data
, then all backups are stored under /var/lib/cluster-data/BACKUP
. This also means that the effective default backup location is the directory
named BACKUP
under the location specified by the FileSystemPath
parameter.
DataMemory
and IndexMemory
are [ndbd]
parameters specifying the size of memory segments used to store the
actual records and their indexes. In setting values for these, it is important to understand how DataMemory
and IndexMemory
are used, as they usually need to be updated to reflect actual
usage by the cluster:
This parameter defines the amount of space (in bytes) available for storing database records. The entire amount specified by this value is allocated in memory, so it is extremely important that the machine has sufficient physical memory to accommodate it.
The memory allocated by DataMemory
is used to store both the actual records and indexes.
There is a 16-byte overhead on each record; an additional amount for each record is incurred because
it is stored in a 32KB page with 128 byte page overhead (see below). There is also a small amount
wasted per page due to the fact that each record is stored in only one page.
For variable-size table attributes, the data is stored on separate datapages, allocated from DataMemory
. Variable-length records use a fixed-size part with an
extra overhead of 4 bytes to reference the variable-size part. The variable-size part has 2 bytes
overhead plus 2 bytes per attribute.
The maximum record size is 14000 bytes.
The memory space defined by DataMemory
is also used to store ordered indexes, which use about 10
bytes per record. Each table row is represented in the ordered index. A common error among users is
to assume that all indexes are stored in the memory allocated by IndexMemory
, but this is not the case: Only primary key and
unique hash indexes use this memory; ordered indexes use the memory allocated by DataMemory
. However, creating a primary key or unique hash index
also creates an ordered index on the same keys, unless you specify USING
HASH
in the index creation statement. This can be verified by running ndb_desc -d db_name
table_name
in the management
client.
Currently, MySQL Cluster can use a maximum of 512 MB for hash indexes per partition, which means in
some cases it is possible to get Table is full errors in MySQL client
applications even when ndb_mgm -e "ALL REPORT MEMORYUSAGE" shows
significant free DataMemory
. This can also pose a problem with data node restarts
on nodes that are heavily loaded with data. You can force NDB
to create extra partitions for MySQL Cluster tables and thus
have more memory available for hash indexes by using the MAX_ROWS
option for CREATE TABLE
. In general, setting MAX_ROWS
to twice the number of rows that you expect to store in the
table should be sufficient. You can also use the MinFreePct
configuration parameter to help avoid problems with
node restarts. (Bug #13436216)
The memory space allocated by DataMemory
consists of 32KB pages, which are allocated to table
fragments. Each table is normally partitioned into the same number of fragments as there are data
nodes in the cluster. Thus, for each node, there are the same number of fragments as are set in NoOfReplicas
.
Once a page has been allocated, it is currently not possible to return it to the pool of free pages,
except by deleting the table. (This also means that DataMemory
pages, once allocated to a given table, cannot be used
by other tables.) Performing a data node recovery also compresses the partition because all records
are inserted into empty partitions from other live nodes.
The DataMemory
memory space also contains UNDO information: For each
update, a copy of the unaltered record is allocated in the DataMemory
. There is also a reference to each copy in the ordered
table indexes. Unique hash indexes are updated only when the unique index columns are updated, in
which case a new entry in the index table is inserted and the old entry is deleted upon commit. For
this reason, it is also necessary to allocate enough memory to handle the largest transactions
performed by applications using the cluster. In any case, performing a few large transactions holds
no advantage over using many smaller ones, for the following reasons:
Large transactions are not any faster than smaller ones
Large transactions increase the number of operations that are lost and must be repeated in event of transaction failure
Large transactions use more memory
The default value for DataMemory
is 80MB; the minimum is 1MB. There is no maximum size, but
in reality the maximum size has to be adapted so that the process does not start swapping when the
limit is reached. This limit is determined by the amount of physical RAM available on the machine
and by the amount of memory that the operating system may commit to any one process. 32-bit
operating systems are generally limited to 2–4GB per process; 64-bit operating systems can use more.
For large databases, it may be preferable to use a 64-bit operating system for this reason.
This parameter controls the amount of storage used for hash indexes in MySQL Cluster. Hash indexes are always used for primary key indexes, unique indexes, and unique constraints. Note that when defining a primary key and a unique index, two indexes will be created, one of which is a hash index used for all tuple accesses as well as lock handling. It is also used to enforce unique constraints.
The size of the hash index is 25 bytes per record, plus the size of the primary key. For primary keys larger than 32 bytes another 8 bytes is added.
The default value for IndexMemory
is 18MB. The minimum is 1MB.
This parameter determines how much memory is allocated for strings such as table names, and is
specified in an [ndbd]
or [ndbd default]
section of the config.ini
file. A value between 0
and 100
inclusive is interpreted as a percent of the maximum default
value, which is calculated based on a number of factors including the number of tables, maximum
table name size, maximum size of .FRM
files, MaxNoOfTriggers
, maximum column name size, and maximum default
column value.
A value greater than 100
is interpreted as a number of bytes.
The default value is 25—that is, 25 percent of the default maximum.
Under most circumstances, the default value should be sufficient, but when you have a great many
Cluster tables (1000 or more), it is possible to get Error 773 Out of string memory, please modify StringMemory config parameter: Permanent error: Schema error,
in which case you should increase this value. 25
(25 percent) is not
excessive, and should prevent this error from recurring in all but the most extreme conditions.
The following example illustrates how memory is used for a table. Consider this table definition:
CREATE TABLE example ( a INT NOT NULL, b INT NOT NULL, c INT NOT NULL, PRIMARY KEY(a), UNIQUE(b)) ENGINE=NDBCLUSTER;
For each record, there are 12 bytes of data plus 12 bytes overhead. Having no nullable columns saves 4 bytes of
overhead. In addition, we have two ordered indexes on columns a
and b
consuming roughly 10 bytes each per record. There is a primary key hash index
on the base table using roughly 29 bytes per record. The unique constraint is implemented by a separate table
with b
as primary key and a
as a column. This other
table consumes an additional 29 bytes of index memory per record in the example
table as well 8 bytes of record data plus 12 bytes of overhead.
Thus, for one million records, we need 58MB for index memory to handle the hash indexes for the primary key and the unique constraint. We also need 64MB for the records of the base table and the unique index table, plus the two ordered index tables.
You can see that hash indexes takes up a fair amount of memory space; however, they provide very fast access to the data in return. They are also used in MySQL Cluster to handle uniqueness constraints.
Currently, the only partitioning algorithm is hashing and ordered indexes are local to each node. Thus, ordered indexes cannot be used to handle uniqueness constraints in the general case.
An important point for both IndexMemory
and DataMemory
is that the total database size is the sum of all data memory and all
index memory for each node group. Each node group is used to store replicated information, so if there are four
nodes with two replicas, there will be two node groups. Thus, the total data memory available is 2 × DataMemory
for each data node.
It is highly recommended that DataMemory
and IndexMemory
be set to the same values for all nodes. Data distribution is even
over all nodes in the cluster, so the maximum amount of space available for any node can be no greater than that
of the smallest node in the cluster.
DataMemory
and IndexMemory
can be changed, but decreasing either of these can be risky; doing so can easily lead to a node or even an
entire MySQL Cluster that is unable to restart due to there being insufficient memory space. Increasing these
values should be acceptable, but it is recommended that such upgrades are performed in the same manner as a
software upgrade, beginning with an update of the configuration file, and then restarting the management server
followed by restarting each data node in turn.
A proportion (5% by
default) of data node resources including DataMemory
and IndexMemory
is kept in reserve to insure that the data node does not exhaust
its memory when performing a restart. This can be adjusted using the MinFreePct
data node configuration parameter (default 5).
Updates do not increase the amount of index memory used. Inserts take effect immediately; however, rows are not actually deleted until the transaction is committed.
Transaction parameters. The next few [ndbd]
parameters that we discuss are
important because they affect the number of parallel transactions and the sizes of transactions that can be
handled by the system. MaxNoOfConcurrentTransactions
sets the number of parallel transactions
possible in a node. MaxNoOfConcurrentOperations
sets the number of records that can be in update
phase or locked simultaneously.
Both of these parameters (especially MaxNoOfConcurrentOperations
) are likely targets for users setting specific values
and not using the default value. The default value is set for systems using small transactions, to ensure that
these do not use excessive memory.
MaxDMLOperationsPerTransaction
sets the maximum number of DML operations that can be performed in a given transaction.
Each cluster data node requires a transaction record for each active transaction in the cluster. The task of coordinating transactions is distributed among all of the data nodes. The total number of transaction records in the cluster is the number of transactions in any given node times the number of nodes in the cluster.
Transaction records are allocated to individual MySQL servers. Each connection to a MySQL server requires at least one transaction record, plus an additional transaction object per table accessed by that connection. This means that a reasonable minimum for the total number of transactions in the cluster can be expressed as
TotalNoOfConcurrentTransactions = (maximum number of tables accessed in any single transaction + 1) * number of cluster SQL nodes
Suppose that there are 10 SQL nodes using the cluster. A single join involving 10 tables requires 11
transaction records; if there are 10 such joins in a transaction, then 10 * 11 = 110 transaction
records are required for this transaction, per MySQL server, or 110 * 10 = 1100 transaction records
total. Each data node can be expected to handle TotalNoOfConcurrentTransactions / number of data
nodes. For a MySQL Cluster having 4 data nodes, this would mean setting MaxNoOfConcurrentTransactions
on each data node to 1100 / 4 = 275. In addition, you should provide for failure recovery by
insuring that a single node group can accommodate all concurrent transactions; in other words, that
each data node's MaxNoOfConcurrentTransactions is sufficient to cover a number of transaction equal
to TotalNoOfConcurrentTransactions / number of node groups. If this cluster has a single node group,
then MaxNoOfConcurrentTransactions
should be set to 1100 (the same as
the total number of concurrent transactions for the entire cluster).
In addition, each transaction involves at least one operation; for this reason, the value set for
MaxNoOfConcurrentTransactions
should always be no more than the value
of MaxNoOfConcurrentOperations
.
This parameter must be set to the same value for all cluster data nodes. This is due to the fact that, when a data node fails, the oldest surviving node re-creates the transaction state of all transactions that were ongoing in the failed node.
Changing the value of MaxNoOfConcurrentTransactions
requires a complete shutdown and
restart of the cluster.
The default value is 4096.
It is a good idea to adjust the value of this parameter according to the size and number of transactions. When performing transactions of only a few operations each and not involving a great many records, there is no need to set this parameter very high. When performing large transactions involving many records need to set this parameter higher.
Records are kept for each transaction updating cluster data, both in the transaction coordinator and in the nodes where the actual updates are performed. These records contain state information needed to find UNDO records for rollback, lock queues, and other purposes.
This parameter should be set at a minimum to the number of records to be updated simultaneously in
transactions, divided by the number of cluster data nodes. For example, in a cluster which has four
data nodes and which is expected to handle one million concurrent updates using transactions, you
should set this value to 1000000 / 4 = 250000. To help provide resiliency against failures, it is
suggested that you set this parameter to a value that is high enough to permit an individual data
node to handle the load for its node group. In other words, you should set the value equal to total number of concurrent operations / number of node groups
. (In
the case where there is a single node group, this is the same as the total number of concurrent
operations for the entire cluster.)
Because each transaction always involves at least one operation, the value of MaxNoOfConcurrentOperations
should always be greater than or equal to the value of MaxNoOfConcurrentTransactions
.
Read queries which set locks also cause operation records to be created. Some extra space is allocated within individual nodes to accommodate cases where the distribution is not perfect over the nodes.
When queries make use of the unique hash index, there are actually two operation records used per record in the transaction. The first record represents the read in the index table and the second handles the operation on the base table.
The default value is 32768.
This parameter actually handles two values that can be configured separately. The first of these specifies how many operation records are to be placed with the transaction coordinator. The second part specifies how many operation records are to be local to the database.
A very large transaction performed on an eight-node cluster requires as many operation records in
the transaction coordinator as there are reads, updates, and deletes involved in the transaction.
However, the operation records of the are spread over all eight nodes. Thus, if it is necessary to
configure the system for one very large transaction, it is a good idea to configure the two parts
separately. MaxNoOfConcurrentOperations
will always be used to calculate the
number of operation records in the transaction coordinator portion of the node.
It is also important to have an idea of the memory requirements for operation records. These consume about 1KB per record.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | integer | UNDEFINED | 32 - 4G |
Restart Type: N |
By default, this parameter is calculated as 1.1 × MaxNoOfConcurrentOperations
. This fits systems with many simultaneous
transactions, none of them being very large. If there is a need to handle one very large transaction
at a time and there are many nodes, it is a good idea to override the default value by explicitly
specifying this parameter.
MaxDMLOperationsPerTransaction
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | operations (DML) | 4294967295 | 32 - 4294967295 |
Restart Type: N |
This parameter limits the size of a transaction. The transaction is aborted if it requires more than
this many DML operations. The minimum number of operations per transaction is 32; however, you can
set MaxDMLOperationsPerTransaction
to 0 to disable any limitation on
the number of DML operations per transaction. The maximum (and default) is 4294967295.
Transaction temporary storage. The next set of [ndbd]
parameters is used to
determine temporary storage when executing a statement that is part of a Cluster transaction. All records are
released when the statement is completed and the cluster is waiting for the commit or rollback.
The default values for these parameters are adequate for most situations. However, users with a need to support transactions involving large numbers of rows or operations may need to increase these values to enable better parallelism in the system, whereas users whose applications require relatively small transactions can decrease the values to save memory.
MaxNoOfConcurrentIndexOperations
For queries using a unique hash index, another temporary set of operation records is used during a
query's execution phase. This parameter sets the size of that pool of records. Thus, this record is
allocated only while executing a part of a query. As soon as this part has been executed, the record
is released. The state needed to handle aborts and commits is handled by the normal operation
records, where the pool size is set by the parameter MaxNoOfConcurrentOperations
.
The default value of this parameter is 8192. Only in rare cases of extremely high parallelism using unique hash indexes should it be necessary to increase this value. Using a smaller value is possible and can save memory if the DBA is certain that a high degree of parallelism is not required for the cluster.
The default value of MaxNoOfFiredTriggers
is 4000, which is sufficient for most
situations. In some cases it can even be decreased if the DBA feels certain the need for parallelism
in the cluster is not high.
A record is created when an operation is performed that affects a unique hash index. Inserting or deleting a record in a table with unique hash indexes or updating a column that is part of a unique hash index fires an insert or a delete in the index table. The resulting record is used to represent this index table operation while waiting for the original operation that fired it to complete. This operation is short-lived but can still require a large number of records in its pool for situations with many parallel write operations on a base table containing a set of unique hash indexes.
The memory affected by this parameter is used for tracking operations fired when updating index tables and reading unique indexes. This memory is used to store the key and column information for these operations. It is only very rarely that the value for this parameter needs to be altered from the default.
The default value for TransactionBufferMemory
is 1MB.
Normal read and write operations use a similar buffer, whose usage is even more short-lived. The
compile-time parameter ZATTRBUF_FILESIZE
(found in ndb/src/kernel/blocks/Dbtc/Dbtc.hpp
) set to 4000 × 128 bytes
(500KB). A similar buffer for key information, ZDATABUF_FILESIZE
(also
in Dbtc.hpp
) contains 4000 × 16 = 62.5KB of buffer space. Dbtc
is the module that handles transaction coordination.
Scans and buffering. There are additional [ndbd]
parameters in the Dblqh
module (in ndb/src/kernel/blocks/Dblqh/Dblqh.hpp
)
that affect reads and updates. These include ZATTRINBUF_FILESIZE
, set by default to
10000 × 128 bytes (1250KB) and ZDATABUF_FILE_SIZE
, set by default to 10000*16 bytes
(roughly 156KB) of buffer space. To date, there have been neither any reports from users nor any results from
our own extensive tests suggesting that either of these compile-time limits should be increased.
This parameter is used to control the number of parallel scans that can be performed in the cluster.
Each transaction coordinator can handle the number of parallel scans defined for this parameter.
Each scan query is performed by scanning all partitions in parallel. Each partition scan uses a scan
record in the node where the partition is located, the number of records being the value of this
parameter times the number of nodes. The cluster should be able to sustain MaxNoOfConcurrentScans
scans concurrently from all nodes in the
cluster.
Scans are actually performed in two cases. The first of these cases occurs when no hash or ordered indexes exists to handle the query, in which case the query is executed by performing a full table scan. The second case is encountered when there is no hash index to support the query but there is an ordered index. Using the ordered index means executing a parallel range scan. The order is kept on the local partitions only, so it is necessary to perform the index scan on all partitions.
The default value of MaxNoOfConcurrentScans
is 256. The maximum value is 500.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | integer | UNDEFINED | 32 - 4G |
Restart Type: N |
Specifies the number of local scan records if many scans are not fully parallelized. When the number
of local scan records is not provided, it is calculated as 4 times the product of MaxNoOfConcurrentScans
and the number of data nodes in the
system. (Previously, it was calculated as the product of MaxNoOfConcurrentScans
and the number of data nodes.) The minimum
value is 32.
This parameter is used to calculate the number of lock records used to handle concurrent scan operations.
BatchSizePerLocalScan
has a strong connection to the BatchSize
defined in the SQL nodes.
This is an internal buffer used for passing messages within individual nodes and between nodes. The default is 4MB.
This parameter seldom needs to be changed from the default.
It is possible to configure the maximum number of parallel scans (TUP
scans and TUX
scans) allowed before they begin queuing for serial
handling. You can increase this to take advantage of any unused CPU when performing large number of
scans in parallel and improve their performance.
The default value for this parameter in MySQL Cluster NDB 7.3 is 256.
This is the maximum size of the memory unit to use when allocating memory for tables. In cases where NDB
gives Out of memory errors, but it is
evident by examining the cluster logs or the output of DUMP 1000
(see DUMP 1000
MaxNoOfTables
, or both) to cause NDB
to make sufficient memory available.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | LDM threads | 3840 | 0 - 3840 |
Restart Type: N |
MySQL Cluster NDB 7.2.7 and later use a larger default table hash map size (3840) than in previous releases
(240). Beginning with MySQL Cluster NDB 7.2.11, the size of the table hash maps used by NDB
is configurable using this parameter; previously this value was
hard-coded. DefaultHashMapSize
can take any of three possible values (0, 240,
3840). These values and their effects are described in the following table:
Value | Description / Effect |
---|---|
0 |
Use the lowest value set, if any, for this parameter among all data nodes and API nodes in the cluster; if it is not set on any data or API node, use the default value. |
240 |
Original hash map size, used by default in all MySQL Cluster releases prior to MySQL Cluster NDB 7.2.7. |
3840 |
Larger hash map size as used by default in MySQL Cluster NDB 7.2.7 andlater |
The primary intended use for this parameter is to facilitate upgrades and especially downgrades between MySQL
Cluster NDB 7.2.7 and later MySQL Cluster versions, in which the larger hash map size (3840) is the default, and
earlier releases (in which the default was 240), due to the fact that this change is not otherwise backward
compatible (Bug #14800539). By setting this parameter to 240 prior to performing an upgrade from an older
version where this value is in use, you can cause the cluster to continue using the smaller size for table hash
maps, in which case the tables remain compatible with earlier versions following the upgrade. DefaultHashMapSize
can be set for individual data nodes, API nodes, or both, but
setting it once only, in the [ndbd default]
section of the config.ini
file, is the recommended practice.
After increasing this parameter, to have existing tables to take advantage of the new size, you can run ALTER TABLE ... REORGANIZE
PARTITION
on them, after which they can use the larger hash map size. This is in addition to
performing a rolling restart, which makes the larger hash maps available to new tables, but does not enable
existing tables to use them.
Decreasing this parameter online after any tables have been created or modified with DefaultHashMapSize
equal to 3840 is not currently supported.
Logging and checkpointing. The following [ndbd]
parameters control log and checkpoint behavior.
This parameter sets the number of REDO log files for the node, and thus the amount of space allocated to REDO logging. Because the REDO log files are organized in a ring, it is extremely important that the first and last log files in the set (sometimes referred to as the "head" and "tail" log files, respectively) do not meet. When these approach one another too closely, the node begins aborting all transactions encompassing updates due to a lack of room for new log records.
A REDO
log record is not removed until the required number of local
checkpoints has been completed since that log record was inserted. (In MySQL Cluster NDB 7.3, only 2
local checkpoints are necessary). Checkpointing frequency is determined by its own set of
configuration parameters discussed elsewhere in this chapter.
The default parameter value is 16, which by default means 16 sets of 4 16MB files for a total of
1024MB. The size of the individual log files is configurable using the FragmentLogFileSize
parameter. In scenarios requiring a great
many updates, the value for NoOfFragmentLogFiles
may need to be set as high as 300 or even
higher to provide sufficient space for REDO logs.
If the checkpointing is slow and there are so many writes to the database that the log files are
full and the log tail cannot be cut without jeopardizing recovery, all updating transactions are
aborted with internal error code 410 (Out of log file space temporarily
).
This condition prevails until a checkpoint has completed and the log tail can be moved forward.
This parameter cannot be changed "on the fly";
you must restart the node using --initial
. If you wish to change
this value for all data nodes in a running cluster, you can do so using a rolling node restart
(using --initial
when starting each data node).
Setting this parameter enables you to control directly the size of redo log files. This can be useful in situations when MySQL Cluster is operating under a high load and it is unable to close fragment log files quickly enough before attempting to open new ones (only 2 fragment log files can be open at one time); increasing the size of the fragment log files gives the cluster more time before having to open each new fragment log file. The default value for this parameter is 16M.
For more information about fragment log files, see the description for NoOfFragmentLogFiles
.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | [see values] | SPARSE | SPARSE, FULL |
Restart Type: IN |
By default, fragment log files are created sparsely when performing an initial start of a data
node—that is, depending on the operating system and file system in use, not all bytes are
necessarily written to disk. However, it is possible to override this behavior and force all bytes
to be written, regardless of the platform and file system type being used, by means of this
parameter. InitFragmentLogFiles
takes either of two values:
SPARSE
. Fragment log files are created
sparsely. This is the default value.
FULL
. Force all bytes of the fragment log
file to be written to disk.
Depending on your operating system and file system, setting InitFragmentLogFiles=FULL
may help eliminate I/O errors on writes to the REDO log.
This parameter sets a ceiling on how many internal threads to allocate for open files. Any situation requiring a change in this parameter should be reported as a bug.
The default value is 0. However, the minimum value to which this parameter can be set is 20.
This parameter sets the initial number of internal threads to allocate for open files.
The default value is 27.
This parameter sets the maximum number of trace files that are kept before overwriting old ones. Trace files are generated when, for whatever reason, the node crashes.
The default is 25 trace files.
In parallel data node recovery, only table data is actually copied and synchronized in parallel; synchronization of metadata such as dictionary and checkpoint information is done in a serial fashion. In addition, recovery of dictionary and checkpoint information cannot be executed in parallel with performing of local checkpoints. This means that, when starting or restarting many data nodes concurrently, data nodes may be forced to wait while a local checkpoint is performed, which can result in longer node recovery times.
It is possible to force a delay in the local checkpoint to permit more (and possibly all) data nodes
to complete metadata synchronization; once each data node's metadata synchronization is complete,
all of the data nodes can recover table data in parallel, even while the local checkpoint is being
executed. To force such a delay, set MaxLCPStartDelay
, which determines the number of seconds the
cluster can wait to begin a local checkpoint while data nodes continue to synchronize metadata. This
parameter should be set in the [ndbd default]
section of the config.ini
file, so that it is the same for all data nodes. The
maximum value is 600; the default is 0.
Metadata objects. The next set of [ndbd]
parameters defines pool sizes for
metadata objects, used to define the maximum number of attributes, tables, indexes, and trigger objects used by
indexes, events, and replication between clusters. Note that these act merely as "suggestions" to the cluster, and any that are not specified revert to the
default values shown.
This parameter sets a suggested maximum number of attributes that can be defined in the cluster;
like MaxNoOfTables
,
it is not intended to function as a hard upper limit.
(In older MySQL Cluster releases, this parameter was sometimes treated as a hard limit for certain
operations. This caused problems with MySQL Cluster Replication, when it was possible to create more
tables than could be replicated, and sometimes led to confusion when it was possible [or not
possible, depending on the circumstances] to create more than MaxNoOfAttributes
attributes.)
The default value is 1000, with the minimum possible value being 32. The maximum is 4294967039. Each attribute consumes around 200 bytes of storage per node due to the fact that all metadata is fully replicated on the servers.
When setting MaxNoOfAttributes
, it is important to prepare in advance for any ALTER TABLE
statements that you might want to perform in the future.
This is due to the fact, during the execution of ALTER TABLE
on a Cluster table, 3 times the number of attributes
as in the original table are used, and a good practice is to permit double this amount. For example,
if the MySQL Cluster table having the greatest number of attributes (greatest_number_of_attributes
)
has 100 attributes, a good starting point for the value of MaxNoOfAttributes
would be 6 *
. greatest_number_of_attributes
= 600
You should also estimate the average number of attributes per table and multiply this by MaxNoOfTables
.
If this value is larger than the value obtained in the previous paragraph, you should use the larger
value instead.
Assuming that you can create all desired tables without any problems, you should also verify that
this number is sufficient by trying an actual ALTER TABLE
after configuring the parameter. If this is not
successful, increase MaxNoOfAttributes
by another multiple of MaxNoOfTables
and test it again.
A table object is allocated for each table and for each unique hash index in the cluster. This
parameter sets a suggested maximum number of table objects for the cluster as a whole; like MaxNoOfAttributes
, it is not intended to function as a hard upper
limit.
(In older MySQL Cluster releases, this parameter was sometimes treated as a hard limit for certain
operations. This caused problems with MySQL Cluster Replication, when it was possible to create more
tables than could be replicated, and sometimes led to confusion when it was possible [or not
possible, depending on the circumstances] to create more than MaxNoOfTables
tables.)
For each attribute that has a BLOB
data type an extra table is used to store most of the BLOB
data. These tables also must be taken into account when defining
the total number of tables.
The default value of this parameter is 128. The minimum is 8 and the maximum is 20320. Each table object consumes approximately 20KB per node.
The sum of MaxNoOfTables
, MaxNoOfOrderedIndexes
, and MaxNoOfUniqueHashIndexes
must not exceed 232 – 2
(4294967294).
For each ordered index in the cluster, an object is allocated describing what is being indexed and
its storage segments. By default, each index so defined also defines an ordered index. Each unique
index and primary key has both an ordered index and a hash index. MaxNoOfOrderedIndexes
sets the total number of ordered indexes
that can be in use in the system at any one time.
The default value of this parameter is 128. Each index object consumes approximately 10KB of data per node.
The sum of MaxNoOfTables
, MaxNoOfOrderedIndexes
, and MaxNoOfUniqueHashIndexes
must not exceed 232 – 2
(4294967294).
For each unique index that is not a primary key, a special table is allocated that maps the unique
key to the primary key of the indexed table. By default, an ordered index is also defined for each
unique index. To prevent this, you must specify the USING HASH
option
when defining the unique index.
The default value is 64. Each index consumes approximately 15KB per node.
The sum of MaxNoOfTables
, MaxNoOfOrderedIndexes
, and MaxNoOfUniqueHashIndexes
must not exceed 232 – 2
(4294967294).
Internal update, insert, and delete triggers are allocated for each unique hash index. (This means that three triggers are created for each unique hash index.) However, an ordered index requires only a single trigger object. Backups also use three trigger objects for each normal table in the cluster.
Replication between clusters also makes use of internal triggers.
This parameter sets the maximum number of trigger objects in the cluster.
The default value is 768.
This parameter is deprecated and subject to removal in a future version of MySQL Cluster. You should
use MaxNoOfOrderedIndexes
and MaxNoOfUniqueHashIndexes
instead.
This parameter is used only by unique hash indexes. There needs to be one record in this pool for each unique hash index defined in the cluster.
The default value of this parameter is 128.
Each NDB
table in a MySQL Cluster requires a subscription in the NDB
kernel. For some NDB API applications, it may be necessary or desirable to change this parameter.
However, for normal usage with MySQL servers acting as SQL nodes, there is not any need to do so.
The default value for MaxNoOfSubscriptions
is 0, which is treated as equal to MaxNoOfTables
.
Each subscription consumes 108 bytes.
This parameter is of interest only when using MySQL Cluster Replication. The default value is 0,
which is treated as 2 * MaxNoOfTables
; that is, there is one
subscription per NDB
table for each of two MySQL servers (one acting as the replication master and the other as the
slave). Each subscriber uses 16 bytes of memory.
When using circular replication, multi-master replication, and other replication setups involving
more than 2 MySQL servers, you should increase this parameter to the number of mysqld processes included in replication (this
is often, but not always, the same as the number of clusters). For example, if you have a circular
replication setup using three MySQL Clusters, with one mysqld attached to each cluster, and each of
these mysqld processes acts as a master and as a
slave, you should set MaxNoOfSubscribers
equal to 3 *
MaxNoOfTables
.
For more information, see Section 17.6, "MySQL Cluster Replication".
MaxNoOfConcurrentSubOperations
This parameter sets a ceiling on the number of operations that can be performed by all API nodes in the cluster at one time. The default value (256) is sufficient for normal operations, and might need to be adjusted only in scenarios where there are a great many API nodes each performing a high volume of operations concurrently.
Boolean parameters. The behavior of data nodes is also affected by a set of [ndbd]
parameters taking on boolean values. These parameters can each be specified as
TRUE
by setting them equal to 1
or Y
, and as FALSE
by setting them equal to 0
or N
.
For a number of operating systems, including Solaris and Linux, it is possible to lock a process into memory and so avoid any swapping to disk. This can be used to help guarantee the cluster's real-time characteristics.
This parameter takes one of the integer values 0
, 1
, or 2
, which act as shown in the following
list:
0
: Disables locking. This is the default
value.
1
: Performs the lock after allocating
memory for the process.
2
: Performs the lock before memory for the
process is allocated.
If the operating system is not configured to permit unprivileged users to lock pages, then the data
node process making use of this parameter may have to be run as system root. (LockPagesInMainMemory
uses the mlockall
function. From Linux kernel 2.6.9, unprivileged users can
lock memory as limited by max locked memory
. For more information, see
ulimit -l and
In older MySQL Cluster releases, this parameter was a Boolean. 0
or false
was the default setting, and
disabled locking. 1
or true
enabled
locking of the process after its memory was allocated. In MySQL Cluster NDB 7.3, using true
or false
as the value of this
parameter causes an error.
Beginning with glibc
2.10, glibc
uses per-thread arenas to reduce lock contention on a shared
pool, which consumes real memory. In general, a data node process does not need per-thread
arenas, since it does not perform any memory allocation after startup. (This difference in
allocators does not appear to affect performance significantly.)
The glibc
behavior is intended to be configurable via the
MALLOC_ARENA_MAX
environment variable, but a bug in this this
mechanism prior to glibc
2.16 meant that this variable could not be
set to less than 8, so that the wasted memory could not be reclaimed. (Bug #15907219; see also
One possible workaround for this problem is to use the LD_PRELOAD
environment variable to preload a jemalloc
memory allocation library to take the place of that supplied with glibc
.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | boolean | true | true, false |
Restart Type: N |
This parameter specifies whether a data node process should exit or perform an automatic restart when an error condition is encountered.
This feature is enabled by default.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | boolean | true | true, false |
Restart Type: S |
When this parameter is enabled, it forces a data node to shut down whenever it encounters a corrupted tuple. In MySQL Cluster NDB 7.3, it is enabled by default.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | true|false (1|0) | false | true, false |
Restart Type: IS |
It is possible to specify MySQL Cluster tables as diskless, meaning that tables are not checkpointed to disk and that no logging occurs. Such tables exist only in main memory. A consequence of using diskless tables is that neither the tables nor the records in those tables survive a crash. However, when operating in diskless mode, it is possible to run ndbd on a diskless computer.
This feature causes the entire cluster to operate in diskless mode.
When this feature is enabled, Cluster online backup is disabled. In addition, a partial start of the cluster is not possible.
Diskless
is disabled by default.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | boolean | false | true, false |
Restart Type: N |
Enabling this parameter causes NDB
to attempt using O_DIRECT
writes for LCP, backups, and redo logs, often
lowering kswapd and CPU usage. When using MySQL
Cluster on Linux, enable ODirect
if you are using a 2.6 or later kernel.
ODirect
is disabled by default.
This feature is accessible only when building the debug version where it is possible to insert errors in the execution of individual blocks of code as part of testing.
This feature is disabled by default.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | boolean | false | true, false |
Restart Type: N |
Setting this parameter to 1
causes backup files to be compressed. The
compression used is equivalent to gzip --fast, and can
save 50% or more of the space required on the data node to store uncompressed backup files.
Compressed backups can be enabled for individual data nodes, or for all data nodes (by setting this
parameter in the [ndbd default]
section of the config.ini
file).
You cannot restore a compressed backup to a cluster running a MySQL version that does not support this feature.
The default value is 0
(disabled).
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | boolean | false | true, false |
Restart Type: N |
Setting this parameter to 1
causes local checkpoint files to be
compressed. The compression used is equivalent to gzip --fast,
and can save 50% or more of the space required on the data node to store uncompressed checkpoint
files. Compressed LCPs can be enabled for individual data nodes, or for all data nodes (by setting
this parameter in the [ndbd default]
section of the config.ini
file).
You cannot restore a compressed local checkpoint to a cluster running a MySQL version that does not support this feature.
The default value is 0
(disabled).
There are a number of [ndbd]
parameters specifying timeouts and intervals between
various actions in Cluster data nodes. Most of the timeout values are specified in milliseconds. Any exceptions
to this are mentioned where applicable.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | milliseconds | 6000 | 70 - 4G |
Restart Type: N |
To prevent the main thread from getting stuck in an endless loop at some point, a "watchdog" thread checks the main thread. This parameter specifies the number of milliseconds between checks. If the process remains in the same state after three checks, the watchdog thread terminates it.
This parameter can easily be changed for purposes of experimentation or to adapt to local conditions. It can be specified on a per-node basis although there seems to be little reason for doing so.
The default timeout is 6000 milliseconds (6 seconds).
TimeBetweenWatchDogCheckInitial
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | milliseconds | 6000 | 70 - 4G |
Restart Type: N |
This is similar to the TimeBetweenWatchDogCheck
parameter, except that TimeBetweenWatchDogCheckInitial
controls the amount of time that
passes between execution checks inside a database node in the early start phases during which memory
is allocated.
The default timeout is 6000 milliseconds (6 seconds).
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | milliseconds | 30000 | 0 - 4G |
Restart Type: N |
This parameter specifies how long the Cluster waits for all data nodes to come up before the cluster initialization routine is invoked. This timeout is used to avoid a partial Cluster startup whenever possible.
This parameter is overridden when performing an initial start or initial restart of the cluster.
The default value is 30000 milliseconds (30 seconds). 0 disables the timeout, in which case the cluster may start only if all nodes are available.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | milliseconds | 60000 | 0 - 4G |
Restart Type: N |
If the cluster is ready to start after waiting for StartPartialTimeout
milliseconds but is still possibly in a
partitioned state, the cluster waits until this timeout has also passed. If StartPartitionedTimeout
is set to 0, the cluster waits
indefinitely.
This parameter is overridden when performing an initial start or initial restart of the cluster.
The default timeout is 60000 milliseconds (60 seconds).
If a data node has not completed its startup sequence within the time specified by this parameter, the node startup fails. Setting this parameter to 0 (the default value) means that no data node timeout is applied.
For nonzero values, this parameter is measured in milliseconds. For data nodes containing extremely large amounts of data, this parameter should be increased. For example, in the case of a data node containing several gigabytes of data, a period as long as 10–15 minutes (that is, 600000 to 1000000 milliseconds) might be required to perform a node restart.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | milliseconds | 15000 | 0 - 4294967039 |
Restart Type: N |
When a data node is configured with Nodegroup = 65536
, is regarded as not being assigned to any node
group. When that is done, the cluster waits StartNoNodegroupTimeout
milliseconds, then treats such nodes as though they had been added to the list passed to the --nowait-nodes
option, and starts. The default value is 15000
(that is, the management
server waits 15 seconds). Setting this parameter equal to 0
means that
the cluster waits indefinitely.
StartNoNodegroupTimeout
must be the same for all data nodes in the
cluster; for this reason, you should always set it in the [ndbd
default]
section of the config.ini
file, rather than for
individual data nodes.
See Section 17.5.13, "Adding MySQL Cluster Data Nodes Online", for more information.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | milliseconds | 5000 | 10 - 4G |
Restart Type: N |
One of the primary methods of discovering failed nodes is by the use of heartbeats. This parameter states how often heartbeat signals are sent and how often to expect to receive them. After missing three heartbeat intervals in a row, the node is declared dead. Thus, the maximum time for discovering a failure through the heartbeat mechanism is four times the heartbeat interval.
In MySQL Cluster NDB 7.3, the default heartbeat interval is 5000 milliseconds (5 seconds). This parameter must not be changed drastically and should not vary widely between nodes. If one node uses 5000 milliseconds and the node watching it uses 1000 milliseconds, obviously the node will be declared dead very quickly. This parameter can be changed during an online software upgrade, but only in small increments.
See also Network communication and latency.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | milliseconds | 1500 | 100 - 4G |
Restart Type: N |
Each data node sends heartbeat signals to each MySQL server (SQL node) to ensure that it remains in
contact. If a MySQL server fails to send a heartbeat in time it is declared "dead," in which case all ongoing
transactions are completed and all resources released. The SQL node cannot reconnect until all
activities initiated by the previous MySQL instance have been completed. The three-heartbeat
criteria for this determination are the same as described for HeartbeatIntervalDbDb
.
The default interval is 1500 milliseconds (1.5 seconds). This interval can vary between individual data nodes because each data node watches the MySQL servers connected to it, independently of all other data nodes.
For more information, see Network communication and latency.
Data nodes send heartbeats to one another in a circular fashion whereby each data node monitors the previous one. If a heartbeat is not detected by a given data node, this node declares the previous data node in the circle "dead" (that is, no longer accessible by the cluster). The determination that a data node is dead is done globally; in other words; once a data node is declared dead, it is regarded as such by all nodes in the cluster.
It is possible for heartbeats between data nodes residing on different hosts to be too slow compared to heartbeats between other pairs of nodes (for example, due to a very low heartbeat interval or temporary connection problem), such that a data node is declared dead, even though the node can still function as part of the cluster. .
In this type of situation, it may be that the order in which heartbeats are transmitted between data nodes makes a difference as to whether or not a particular data node is declared dead. If this declaration occurs unnecessarily, this can in turn lead to the unnecessary loss of a node group and as thus to a failure of the cluster.
Consider a setup where there are 4 data nodes A, B, C, and D running on 2 host computers host1
and host2
, and that these data nodes
make up 2 node groups, as shown in the following table:
host1 |
host2 |
|
Node Group 0: | Node A | Node B |
Node Group 1: | Node C | Node D |
Suppose the heartbeats are transmitted in the order A->B->C->D->A. In this case, the loss of the heartbeat between the hosts causes node B to declare node A dead and node C to declare node B dead. This results in loss of Node Group 0, and so the cluster fails. On the other hand, if the order of transmission is A->B->D->C->A (and all other conditions remain as previously stated), the loss of the heartbeat causes nodes A and D to be declared dead; in this case, each node group has one surviving node, and the cluster survives.
The HeartbeatOrder
configuration parameter makes the order of heartbeat transmission user-configurable. The default
value for HeartbeatOrder
is zero; allowing the default value to be used on
all data nodes causes the order of heartbeat transmission to be determined by NDB
.
If this parameter is used, it must be set to a nonzero value (maximum 65535) for every data node in
the cluster, and this value must be unique for each data node; this causes the heartbeat
transmission to proceed from data node to data node in the order of their HeartbeatOrder
values from lowest to highest (and then directly
from the data node having the highest HeartbeatOrder
to the data node having the lowest value, to
complete the circle). The values need not be consecutive; for example, to force the heartbeat
transmission order A->B->D->C->A in the scenario outlined previously, you could set the
HeartbeatOrder
values as shown here:
Node | HeartbeatOrder |
---|---|
A | 10 |
B | 20 |
C | 30 |
D | 25 |
To use this parameter to change the heartbeat transmission order in a running MySQL Cluster, you
must first set HeartbeatOrder
for each data node in the cluster in the global
configuration (config.ini
) file (or files). To cause the change to
take effect, you must perform either of the following:
A complete shutdown and restart of the entire cluster.
2 rolling restarts of the cluster in succession. All nodes must be restarted in the same order in both rolling restarts.
You can use DUMP 908
to observe the effect of this parameter in the
data node logs.
This parameter enables connection checking between data nodes. A data node that fails to respond
within an interval of ConnectCheckIntervalDelay
seconds is considered
suspect, and is considered dead after two such intervals.
The default value for this parameter is 0; this is a change from MySQL Cluster NDB 7.1.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | number of 4-byte words, as a base-2 logarithm | 20 | 0 - 31 |
Restart Type: N |
This parameter is an exception in that it does not specify a time to wait before starting a new local checkpoint; rather, it is used to ensure that local checkpoints are not performed in a cluster where relatively few updates are taking place. In most clusters with high update rates, it is likely that a new local checkpoint is started immediately after the previous one has been completed.
The size of all write operations executed since the start of the previous local checkpoints is added. This parameter is also exceptional in that it is specified as the base-2 logarithm of the number of 4-byte words, so that the default value 20 means 4MB (4 × 220) of write operations, 21 would mean 8MB, and so on up to a maximum value of 31, which equates to 8GB of write operations.
All the write operations in the cluster are added together. Setting TimeBetweenLocalCheckpoints
to 6 or less means that local checkpoints
will be executed continuously without pause, independent of the cluster's workload.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | milliseconds | 2000 | 20 - 32000 |
Restart Type: N |
When a transaction is committed, it is committed in main memory in all nodes on which the data is mirrored. However, transaction log records are not flushed to disk as part of the commit. The reasoning behind this behavior is that having the transaction safely committed on at least two autonomous host machines should meet reasonable standards for durability.
It is also important to ensure that even the worst of cases—a complete crash of the cluster—is handled properly. To guarantee that this happens, all transactions taking place within a given interval are put into a global checkpoint, which can be thought of as a set of committed transactions that has been flushed to disk. In other words, as part of the commit process, a transaction is placed in a global checkpoint group. Later, this group's log records are flushed to disk, and then the entire group of transactions is safely committed to disk on all computers in the cluster.
This parameter defines the interval between global checkpoints. The default is 2000 milliseconds.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | milliseconds | 100 | 0 - 32000 |
Restart Type: N |
This parameter defines the interval between synchronization epochs for MySQL Cluster Replication. The default value is 100 milliseconds.
TimeBetweenEpochs
is part of the implementation of "micro-GCPs",
which can be used to improve the performance of MySQL Cluster Replication.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | milliseconds | 4000 | 0 - 256000 |
Restart Type: N |
This parameter defines a timeout for synchronization epochs for MySQL Cluster Replication. If a node fails to participate in a global checkpoint within the time determined by this parameter, the node is shut down. In MySQL Cluster NDB 7.3, the default value is 0; in other words, the timeout is disabled.
TimeBetweenEpochsTimeout
is part of the implementation of "micro-GCPs", which can be used to improve the
performance of MySQL Cluster Replication.
The current value of this parameter and a warning are written to the cluster log whenever a GCP save takes longer than 1 minute or a GCP save takes longer than 10 seconds.
Setting this parameter to zero has the effect of disabling GCP stops caused by save timeouts, commit timeouts, or both. The maximum possible value for this parameter is 256000 milliseconds.
The number of unprocessed epochs by which a subscribing node can lag behind. Exceeding this number causes a lagging subscriber to be disconnected.
The default value of 100 is sufficient for most normal operations. If a subscribing node does lag
enough to cause disconnections, it is usually due to network or scheduling issues with regard to
processes or threads. (In rare circumstances, the problem may be due to a bug in the NDB
client.) It may be desirable to set the value lower than the
default when epochs are longer.
Disconnection prevents client issues from affecting the data node service, running out of memory to buffer data, and eventually shutting down. Instead, only the client is affected as a result of the disconnect (by, for example gap events in the binary log), forcing the client to reconnect or restart the process.
TimeBetweenInactiveTransactionAbortCheck
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | milliseconds | 1000 | 1000 - 4G |
Restart Type: N |
Timeout handling is performed by checking a timer on each transaction once for every interval specified by this parameter. Thus, if this parameter is set to 1000 milliseconds, every transaction will be checked for timing out once per second.
The default value is 1000 milliseconds (1 second).
This parameter states the maximum time that is permitted to lapse between operations in the same transaction before the transaction is aborted.
The default for this parameter is 4G
(also the maximum). For a
real-time database that needs to ensure that no transaction keeps locks for too long, this parameter
should be set to a relatively small value. The unit is milliseconds.
TransactionDeadlockDetectionTimeout
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | milliseconds | 1200 | 50 - 4G |
Restart Type: N |
When a node executes a query involving a transaction, the node waits for the other nodes in the cluster to respond before continuing. A failure to respond can occur for any of the following reasons:
The node is "dead"
The operation has entered a lock queue
The node requested to perform the action could be heavily overloaded.
This timeout parameter states how long the transaction coordinator waits for query execution by another node before aborting the transaction, and is important for both node failure handling and deadlock detection.
The default timeout value is 1200 milliseconds (1.2 seconds).
The minimum for this parameter is 50 milliseconds.
This is the maximum number of bytes to store before flushing data to a local checkpoint file. This
is done to prevent write buffering, which can impede performance significantly. This parameter is
not intended to take the place of TimeBetweenLocalCheckpoints
.
When ODirect
is enabled, it is not necessary to set DiskSyncSize
; in fact, in such cases its value is simply ignored.
The default value is 4M (4 megabytes).
The amount of data,in bytes per second, that is sent to disk during a local checkpoint. This allocation is shared by DML operations and backups (but not backup logging), which means that backups started during times of intensive DML may be impaired by flooding of the redo log buffer and may fail altogether if the contention is sufficiently severe.
The default value is 10M (10 megabytes per second).
The amount of data,in bytes per second, that is sent to disk during a local checkpoint as part of a restart operation.
The default value is 100M (100 megabytes per second).
NoOfDiskPagesToDiskAfterRestartTUP
This parameter is deprecated and subject to removal in a future version of MySQL Cluster. Use DiskCheckpointSpeedInRestart
and DiskSyncSize
instead.
NoOfDiskPagesToDiskAfterRestartACC
This parameter is deprecated and subject to removal in a future version of MySQL Cluster. Use DiskCheckpointSpeedInRestart
and DiskSyncSize
instead.
NoOfDiskPagesToDiskDuringRestartTUP
(DEPRECATED)
This parameter is deprecated and subject to removal in a future version of MySQL Cluster. Use DiskCheckpointSpeedInRestart
and DiskSyncSize
instead.
NoOfDiskPagesToDiskDuringRestartACC
(DEPRECATED)
This parameter is deprecated and subject to removal in a future version of MySQL Cluster. Use DiskCheckpointSpeedInRestart
and DiskSyncSize
instead.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | milliseconds | 7500 | 10 - 4G |
Restart Type: N |
This parameter specifies how long data nodes wait for a response from the arbitrator to an arbitration message. If this is exceeded, the network is assumed to have split.
In MySQL Cluster NDB 7.3, the default value is 7500 milliseconds (7.5 seconds).
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | enumeration | Default | Default, Disabled, WaitExternal |
Restart Type: N |
The Arbitration
parameter enables a choice of arbitration schemes, corresponding to one of 3 possible values for
this parameter:
Default
. This enables arbitration
to proceed normally, as determined by the ArbitrationRank
settings for the management and API nodes. This is the default value.
Disabled
. Setting Arbitration = Disabled
in the [ndbd
default]
section of the config.ini
file to accomplishes
the same task as setting ArbitrationRank
to 0 on all management
and API nodes. When Arbitration
is set in this way, any ArbitrationRank
settings are ignored.
WaitExternal
. The Arbitration
parameter also makes it possible to configure
arbitration in such a way that the cluster waits until after the time determined by ArbitrationTimeout
has passed for an external cluster
manager application to perform arbitration instead of handling arbitration internally. This
can be done by setting Arbitration = WaitExternal
in the [ndbd default]
section of the config.ini
file. For best results with the WaitExternal
setting, it is
recommended that ArbitrationTimeout
be 2 times as long as the interval
required by the external cluster manager to perform arbitration.
This parameter should be used only in the [ndbd default]
section of the cluster configuration file. The behavior of the cluster is unspecified when Arbitration
is set to different values for individual data nodes.
Buffering and logging. Several [ndbd]
configuration parameters enable the
advanced user to have more control over the resources used by node processes and to adjust various buffer sizes
at need.
These buffers are used as front ends to the file system when writing log records to disk. If the node is running
in diskless mode, these parameters can be set to their minimum values without penalty due to the fact that disk
writes are "faked" by the NDB
storage engine's file system abstraction layer.
The UNDO index buffer, whose size is set by this parameter, is used during local checkpoints. The NDB
storage engine uses a recovery scheme based on checkpoint
consistency in conjunction with an operational REDO log. To produce a consistent checkpoint without
blocking the entire system for writes, UNDO logging is done while performing the local checkpoint.
UNDO logging is activated on a single table fragment at a time. This optimization is possible
because tables are stored entirely in main memory.
The UNDO index buffer is used for the updates on the primary key hash index. Inserts and deletes rearrange the hash index; the NDB storage engine writes UNDO log records that map all physical changes to an index page so that they can be undone at system restart. It also logs all active insert operations for each fragment at the start of a local checkpoint.
Reads and updates set lock bits and update a header in the hash index entry. These changes are handled by the page-writing algorithm to ensure that these operations need no UNDO logging.
This buffer is 2MB by default. The minimum value is 1MB, which is sufficient for most applications.
For applications doing extremely large or numerous inserts and deletes together with large
transactions and large primary keys, it may be necessary to increase the size of this buffer. If
this buffer is too small, the NDB storage engine issues internal error code 677 (Index UNDO buffers overloaded
).
It is not safe to decrease the value of this parameter during a rolling restart.
This parameter sets the size of the UNDO data buffer, which performs a function similar to that of the UNDO index buffer, except the UNDO data buffer is used with regard to data memory rather than index memory. This buffer is used during the local checkpoint phase of a fragment for inserts, deletes, and updates.
Because UNDO log entries tend to grow larger as more operations are logged, this buffer is also larger than its index memory counterpart, with a default value of 16MB.
This amount of memory may be unnecessarily large for some applications. In such cases, it is possible to decrease this size to a minimum of 1MB.
It is rarely necessary to increase the size of this buffer. If there is such a need, it is a good idea to check whether the disks can actually handle the load caused by database update activity. A lack of sufficient disk space cannot be overcome by increasing the size of this buffer.
If this buffer is too small and gets congested, the NDB storage engine issues internal error code 891 (Data UNDO buffers overloaded).
It is not safe to decrease the value of this parameter during a rolling restart.
All update activities also need to be logged. The REDO log makes it possible to replay these updates whenever the system is restarted. The NDB recovery algorithm uses a "fuzzy" checkpoint of the data together with the UNDO log, and then applies the REDO log to play back all changes up to the restoration point.
RedoBuffer
sets the size of the buffer in which the REDO log is
written. The default value is 32MB; the minimum value is 1MB.
If this buffer is too small, the NDB
storage engine issues error code 1221 (REDO log buffers overloaded).
For this reason, you should exercise care if you attempt to decrease the value of RedoBuffer
as part of an online change in the cluster's
configuration.
Controlling log messages. In managing the cluster, it is very important to be able to control the number
of log messages sent for various event types to stdout
. For each event category,
there are 16 possible event levels (numbered 0 through 15). Setting event reporting for a given event category
to level 15 means all event reports in that category are sent to stdout
; setting it
to 0 means that there will be no event reports made in that category.
By default, only the startup message is sent to stdout
, with the remaining event
reporting level defaults being set to 0. The reason for this is that these messages are also sent to the
management server's cluster log.
An analogous set of levels can be set for the management client to determine which event levels to record in the cluster log.
The reporting level for events generated during startup of the process.
The default level is 1.
The reporting level for events generated as part of graceful shutdown of a node.
The default level is 0.
The reporting level for statistical events such as number of primary key reads, number of updates, number of inserts, information relating to buffer usage, and so on.
The default level is 0.
The reporting level for events generated by local and global checkpoints.
The default level is 0.
The reporting level for events generated during node restart.
The default level is 0.
The reporting level for events generated by connections between cluster nodes.
The default level is 0.
The reporting level for events generated by errors and warnings by the cluster as a whole. These errors do not cause any node failure but are still considered worth reporting.
The default level is 0.
The reporting level for events generated by congestion. These errors do not cause node failure but are still considered worth reporting.
The default level is 0.
The reporting level for events generated for information about the general state of the cluster.
The default level is 0.
This parameter controls how often data node memory usage reports are recorded in the cluster log; it is an integer value representing the number of seconds between reports.
Each data node's data memory and index memory usage is logged as both a percentage and a number of
32 KB pages of the DataMemory
and IndexMemory
, respectively, set in the config.ini
file. For example, if DataMemory
is equal to 100 MB, and a given data node is using 50
MB for data memory storage, the corresponding line in the cluster log might look like this:
2006-12-24 01:18:16 [MgmSrvr] INFO -- Node 2: Data usage is 50%(1280 32K pages of total 2560)
MemReportFrequency
is not a required parameter. If used, it can be set for all cluster data nodes in the [ndbd default]
section of config.ini
,
and can also be set or overridden for individual data nodes in the corresponding [ndbd]
sections of the configuration file. The minimum value—which is
also the default value—is 0, in which case memory reports are logged only when memory usage reaches
certain percentages (80%, 90%, and 100%), as mentioned in the discussion of statistics events in Section 17.5.6.2, "MySQL Cluster Log Events".
When a data node is started with the --initial
, it initializes the redo log file during Start Phase 4 (see
Section 17.5.1, "Summary of MySQL Cluster
Start Phases"). When very large values are set for NoOfFragmentLogFiles
, FragmentLogFileSize
, or both, this initialization can take a long
time.You can force reports on the progress of this process to be logged periodically, by means of
the StartupStatusReportFrequency
configuration parameter. In this
case, progress is reported in the cluster log, in terms of both the number of files and the amount
of space that have been initialized, as shown here:
2009-06-20 16:39:23 [MgmSrvr] INFO -- Node 1: Local redo log file initialization status:#Total files: 80, Completed: 60#Total MBytes: 20480, Completed: 155572009-06-20 16:39:23 [MgmSrvr] INFO -- Node 2: Local redo log file initialization status:#Total files: 80, Completed: 60#Total MBytes: 20480, Completed: 15570
These reports are logged each StartupStatusReportFrequency
seconds during Start Phase 4. If StartupStatusReportFrequency
is 0 (the default), then reports are
written to the cluster log only when at the beginning and at the completion of the redo log file
initialization process.
Debugging Parameters. In MySQL Cluster NDB 7.3, it is possible to
cause logging of traces for events generated by creating and dropping tables using DictTrace
. This parameter is useful only in debugging NDB kernel code. DictTrace
takes an integer value; currently, 0 (default - no logging) and 1
(logging enabled) are the only supported values.
Backup parameters. The [ndbd]
parameters discussed in this section define memory buffers set aside for execution of online backups.
In creating a backup, there are two buffers used for sending data to the disk. The backup data
buffer is used to fill in data recorded by scanning a node's tables. Once this buffer has been
filled to the level specified as BackupWriteSize
, the pages are sent to disk. While flushing data
to disk, the backup process can continue filling this buffer until it runs out of space. When this
happens, the backup process pauses the scan and waits until some disk writes have completed freed up
memory so that scanning may continue.
The default value for this parameter is 16MB.
The backup log buffer fulfills a role similar to that played by the backup data buffer, except that it is used for generating a log of all table writes made during execution of the backup. The same principles apply for writing these pages as with the backup data buffer, except that when there is no more space in the backup log buffer, the backup fails. For that reason, the size of the backup log buffer must be large enough to handle the load caused by write activities while the backup is being made. See Section 17.5.3.3, "Configuration for MySQL Cluster Backups".
The default value for this parameter should be sufficient for most applications. In fact, it is more likely for a backup failure to be caused by insufficient disk write speed than it is for the backup log buffer to become full. If the disk subsystem is not configured for the write load caused by applications, the cluster is unlikely to be able to perform the desired operations.
It is preferable to configure cluster nodes in such a manner that the processor becomes the bottleneck rather than the disks or the network connections.
The default value for this parameter is 16MB.
This parameter is simply the sum of BackupDataBufferSize
and BackupLogBufferSize
.
The default value of this parameter in MySQL Cluster NDB 7.3 is 16MB + 16MB = 32MB.
If BackupDataBufferSize
and BackupLogBufferSize
taken together exceed the default value for
BackupMemory
,
then this parameter must be set explicitly in the config.ini
file
to their sum.
This parameter controls how often backup status reports are issued in the management client during a
backup, as well as how often such reports are written to the cluster log (provided cluster event
logging is configured to permit it—see Logging and
checkpointing). BackupReportFrequency
represents the time in seconds between
backup status reports.
The default value is 0.
This parameter specifies the default size of messages written to disk by the backup log and backup data buffers.
The default value for this parameter is 256KB.
This parameter specifies the maximum size of messages written to disk by the backup log and backup data buffers.
The default value for this parameter is 1MB.
When specifying these parameters, the following relationships must hold true. Otherwise, the data node will be unable to start.
BackupDataBufferSize >= BackupWriteSize + 188KB
BackupLogBufferSize >= BackupWriteSize + 16KB
BackupMaxWriteSize >= BackupWriteSize
The [ndbd]
parameters discussed in this section are used in scheduling and locking
of threads to specific CPUs on multiprocessor data node hosts.
To make use of these parameters, the data node process must be run as system root.
When used with ndbd, this parameter (now a string) specifies the ID
of the CPU assigned to handle the NDBCLUSTER
execution thread. When used with ndbmtd, the value of this parameter is a
comma-separated list of CPU IDs assigned to handle execution threads. Each CPU ID in the list should
be an integer in the range 0 to 65535 (inclusive).
The number of IDs specified should match the number of execution threads determined by MaxNoOfExecutionThreads
. However, there is no guarantee that threads
are assigned to CPUs in any given order when using this parameter. You can obtain more
finely-grained control of this type using ThreadConfig
.
LockExecuteThreadToCPU
has no default value.
This parameter specifies the ID of the CPU assigned to handle NDBCLUSTER
maintenance threads.
The value of this parameter is an integer in the range 0 to 65535 (inclusive). In MySQL Cluster NDB 7.3, there is no default value.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | boolean | false | true, false |
Restart Type: N |
Setting this parameter to 1 enables real-time scheduling of NDBCLUSTER
threads.
This parameter is not intended or recommended for use with data nodes running ndbmtd, and enabling it in such cases is known to have adverse effects.
The default is 0 (scheduling disabled).
This parameter specifies the time in microseconds for threads to be executed in the scheduler before being sent. Setting it to 0 minimizes the response time; to achieve higher throughput, you can increase the value at the expense of longer response times.
The default is 50 μsec, which our testing shows to increase throughput slightly in high-load cases without materially delaying requests.
This parameter specifies the time in microseconds for threads to be executed in the scheduler before sleeping.
The default value is 0.
This parameter determines the number of threads to create when rebuilding indexes during a system or
node start. It is supported only when there is more than one fragment for the table per data node
(for example, when the MAX_ROWS
option has been used with CREATE TABLE
).
Setting this parameter to 0 (which is also the default value) disables multi-threaded building of ordered indexes. The maximum allowed value is 128.
This parameter is supported when using ndbd or ndbmtd.
You can enable multi-threaded builds during data node initial restarts by setting the TwoPassInitialNodeRestartCopy
data node configuration parameter to
TRUE
.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | boolean | false | true, false |
Restart Type: N |
Multi-threaded building of ordered indexes can be enabled for initial restarts of data nodes by
setting this configuration parameter to TRUE
, which enables two-pass
copying of data during initial node restarts.
You must also set BuildIndexThreads
to a nonzero value.
NDB
is extremely sensitive to Non-Uniform Memory Access settings and
multi-CPU systems due to timeouts that it can cause. Due to this fact, and because most MySQL
Cluster users do not employ numactl, support for NUMA
is ignored by default by ndbd when running on a Linux system. If your
Linux system provides NUMA support and you wish for data node memory to be subject to NUMA control,
you can set this parameter equal to 0.
The Numa
configuration parameter is supported only on Linux systems where
libnuma.so
is installed.
Multi-Threading Configuration Parameters (ndbmtd). ndbmtd runs by default as a single-threaded process and must be
configured to use multiple threads, using either of two methods, both of which require setting configuration
parameters in the config.ini
file. The first method is simply to set an
appropriate value for the MaxNoOfExecutionThreads
configuration parameter. MySQL Cluster NDB 7.3 also
supports a second method, whereby it is possible to set up more complex rules for ndbmtd multi-threading using ThreadConfig
. The next few paragraphs provide information about these
parameters and their use with multi-threaded data nodes.
This parameter directly controls the number of execution threads used by ndbmtd, up to a maximum of 36. Although this
parameter is set in [ndbd]
or [ndbd
default]
sections of the config.ini
file, it is exclusive
to ndbmtd and does not apply to ndbd.
Setting MaxNoOfExecutionThreads
sets the number of threads by type as
determined in the following table:
MaxNoOfExecutionThreads Value |
LQH Threads | TC Threads | Send Threads | Receive Threads |
---|---|---|---|---|
0 .. 3 | 1 | 1 | 0 | 1 |
4 .. 6 | 2 | 1 | 0 | 1 |
7 .. 8 | 4 | 1 | 0 | 1 |
9 | 4 | 2 | 0 | 1 |
10 | 4 | 2 | 1 | 1 |
11 | 4 | 3 | 1 | 1 |
12 | 4 | 3 | 1 | 2 |
13 | 4 | 3 | 2 | 2 |
14 | 4 | 4 | 2 | 2 |
15 | 4 | 5 | 2 | 2 |
16 | 8 | 3 | 1 | 2 |
17 | 8 | 4 | 1 | 2 |
18 | 8 | 4 | 2 | 2 |
19 | 8 | 5 | 2 | 2 |
20 | 8 | 5 | 2 | 3 |
21 | 8 | 5 | 3 | 3 |
22 | 8 | 6 | 3 | 3 |
23 | 8 | 7 | 3 | 3 |
24 | 12 | 5 | 2 | 3 |
25 | 12 | 6 | 2 | 3 |
26 | 12 | 6 | 3 | 3 |
27 | 12 | 7 | 3 | 3 |
28 | 12 | 7 | 3 | 4 |
29 | 12 | 8 | 3 | 4 |
30 | 12 | 8 | 4 | 4 |
31 | 12 | 9 | 4 | 4 |
32 | 16 | 8 | 3 | 3 |
33 | 16 | 8 | 3 | 4 |
34 | 16 | 8 | 4 | 4 |
35 | 16 | 9 | 4 | 4 |
36 | 16 | 10 | 4 | 4 |
In MySQL Cluster NDB 7.3 and later, there is always one SUMA (replication) thread.
Note that the number of LQH threads must not exceed NoOfFragmentLogParts
. If this parameter's value is the default (4),
this means that you must increase it as well, when setting MaxNoOfExecutionThreads
to 16 or greater; that is, you should set NoOfFragmentLogParts
to the
corresponding number of LQH threads value shown for that value of MaxNoOfExecutionThreads
in the preceding table.
The thread types are described later in this section (see ThreadConfig
).
Setting this parameter outside the permitted range of values causes the management server to abort
on startup with the error Error line number
: Illegal value value
for parameter MaxNoOfExecutionThreads.
For MaxNoOfExecutionThreads
, a value of 0 or 1 is rounded up internally
by NDB
to 2, so that 2 is considered this parameter's default and
minimum value.
MaxNoOfExecutionThreads
is generally intended to be set equal to the
number of CPU threads available, and to allocate a number of threads of each type suitable to
typical workloads. It does not assign particular threads to specified CPUs. For cases where it is
desirable to vary from the settings provided, or to bind threads to CPUs, you should use ThreadConfig
instead, which allows you to allocate each thread directly to a desired type, CPU, or both.
The multi-threaded data node process always spawns at least 5 threads, listed here:
1 local query handler (LQH) thread
1 transaction coordinator (TC) thread
1 send thread
(It is possible to keep any separate send threads from being employed, as explained elsewhere in this section.)
1 receive thread
1 subscription manager (SUMA or replication) thread
Set the number of log file groups for redo logs belonging to this ndbmtd. The value must be an even multiple of 4 between 4 and 16, inclusive.
This parameter is used with ndbmtd to assign threads of different types to different CPUs. Its value is a string whose format has the following syntax:
ThreadConfig :=entry
[,entry
[,...]]entry
:=type
={param
[,param
[,...]]}type
:= ldm | main | recv | send | rep | ioparam
:= count=number
| cpubind=cpu_list
The curly braces ({
...}
) surrounding the
list of parameters is required, even if there is only one parameter in the list.
A param
(parameter) specifies the number of threads of the
given type (count
), the CPUs to which the threads of the given type are
to be bound (cpubind
), or both.
The type
attribute represents an NDB thread type. The
thread types supported in MySQL Cluster NDB 7.3 and the range of permitted count
values for each are provided in the following list:
ldm
: Local query handler (DBLQH
kernel block) that handles data. The more LDM threads that
are used, the more highly partitioned the data becomes. Each LDM thread maintains its own
sets of data and index partitions, as well as its own redo log. In MySQL Cluster NDB 7.3,
the maximum is 16 such threads (in MySQL Cluster NDB 7.1, the maximum was 4).
Range: 1 - 16.
tc
: Transaction coordinator thread (DBTC
kernel block) containing the state of an ongoing
transaction. In MySQL Cluster NDB 7.3, the number of TC threads is configurable, with a
total of 16 possible.
Optimally, every new transaction can be assigned to a new TC thread. In most cases 1 TC thread per 2 LDM threads is sufficient to guarantee that this can happen. In cases where the number of writes is relatively small when compared to the number of reads, it is possible that only 1 TC thread per 4 LQH threads is required to maintain transaction states. Conversely, in applications that perform a great many updates, it may be necessary for the ratio of TC threads to LDM threads to approach 1 (for example, 3 TC threads to 4 LDM threads).
Range: 1 - 16.
main
: Data dictionary and transaction
coordinator (DBDIH
and DBTC
kernel
blocks), providing schema management. This is always handled by a single dedicated thread.
Range: 1 only.
recv
: Receive thread (CMVMI
kernel block). Each receive thread handles one or more
sockets for communicating with other nodes in a MySQL Cluster, with one socket per node.
MySQL Cluster 7.3 implements multiple receive threads (up to 8).
Range: 1 - 8.
send
: Send thread (CMVMI
kernel block). To increase throughput, it is possible to perform sends from one or more
separate, dedicated threads (maximum 8).
Previously, all threads handled their own sending directly; this can still be made to
happen by setting the number of send threads to 0 (this also happens when MaxNoOfExecutionThreads
is set equal to 9). While
doing so can have an adeverse impact on throughput, it can also in some cases provide
decreased latency.
Range: 0 - 8.
rep
: Replication thread (SUMA
kernel block). Asynchronous replication operations are
always handled by a single.dedicated thread.
Range: 1 only.
io
: File system and other miscellaneous
operations. These are not demanding tasks, and are always handled as a group by a single,
dedicated I/O thread.
Range: 1 only.
Simple examples:
# Example 1.ThreadConfig=ldm={count=2,cpubind=1,2},main={cpubind=12},rep={cpubind=11}# Example 2.Threadconfig=main={cpubind=0},ldm={count=4,cpubind=1,2,5,6},io={cpubind=3}
It is usually desirable when configuring thread usage for a data node host to reserve one or more number of CPUs
for operating system and other tasks. Thus, for a host machine with 24 CPUs, you might want to use 20 CPU
threads (leaving 4 for other uses), with 8 LDM threads, 4 TC threads (half the number of LDM threads), 3 send
threads, 3 receive threads, and 1 thread each for schema management, asynchronous replication, and I/O
operations. (This is almost the same distribution of threads used when MaxNoOfExecutionThreads
is set equal to 20.) The following ThreadConfig
setting performs these assignments,
additionally binding all of these threads to specific CPUs:
ThreadConfig=ldm{count=8,cpubind=1,2,3,4,5,6,7,8},main={cpubind=9},io={cpubind=9}, \rep={cpubind=10},tc{count=4,cpubind=11,12,13,14},recv={count=3,cpubind=15,16,17}, \send{count=3,cpubind=18,19,20}
It should be possible in most cases to bind the main (schema management) thread and the I/O thread to the same CPU, as we have done in the example just shown.
In order to take advantage of the enhanced stability that the use of ThreadConfig
offers, it is necessary to insure that CPUs are isolated, and that they not subject to interrupts, or to being
scheduled for other tasks by the operating system. On many Linux systems, you can do this by setting IRQBALANCE_BANNED_CPUS
in /etc/sysconfig/irqbalance
to 0xFFFFF0
, and by using the isolcpus
boot option in
grub.conf
. For specific information, see your operating system or platform
documentation.
Disk Data Configuration Parameters. Configuration parameters affecting Disk Data behavior include the following:
This determines the amount of space used for caching pages on disk, and is set in the [ndbd]
or [ndbd default]
section of the
config.ini
file. It is measured in bytes. Each page takes up 32 KB.
This means that Cluster Disk Data storage always uses N
*
32 KB memory where N
is some nonnegative integer.
The default value for this parameter is 64M
(2000 pages of 32 KB each).
You can query the ndbinfo.diskpagebuffer
table to help determine whether the value for this parameter should be increased to minimize
unnecessary disk seeks. See Section
17.5.10.8, "The ndbinfo diskpagebuffer
Table", for more
information.
This parameter determines the amount of memory that is used for log buffers, disk operations (such
as page requests and wait queues), and metadata for tablespaces, log file groups, UNDO
files, and data files. The shared global memory pool also
provides memory used for satisfying the memory requirements of the INITIAL_SIZE
and UNDO_BUFFER_SIZE
options used with CREATE LOGFILE GROUP
and ALTER LOGFILE GROUP
statements, including any default value
implied for these options by the setting of the InitialLogFileGroup
data node configuration parameter. SharedGlobalMemory
can be set in the [ndbd]
or [ndbd default]
section of the
config.ini
configuration file, and is measured in bytes.
The default value is 128M
.
This parameter determines the number of unbound threads used for Disk Data file access. Before DiskIOThreadPool
was introduced, exactly one thread was spawned for each Disk Data file, which could lead to
performance issues, particularly when using very large data files. With DiskIOThreadPool
, you can—for example—access a single large data
file using several threads working in parallel.
Currently, this parameter applies to Disk Data I/O threads only, but we plan in the future to make the number of such threads configurable for in-memory data as well.
The optimum value for this parameter depends on your hardware and configuration, and includes these factors:
Physical distribution of Disk Data files. You can obtain better
performance by placing data files, undo log files, and the data node file system on separate
physical disks. If you do this with some or all of these sets of files, then you can set DiskIOThreadPool
higher to enable separate threads to
handle the files on each disk.
Disk performance and types. The number of threads that can be
accommodated for Disk Data file handling is also dependent on the speed and throughput of
the disks. Faster disks and higher throughput allow for more disk I/O threads. Our test
results indicate that solid-state disk drives can handle many more disk I/O threads than
conventional disks, and thus higher values for DiskIOThreadPool
.
The default value for this parameter is 2.
Disk Data file system parameters. The parameters in the following list make it possible to place MySQL Cluster Disk Data files in specific directories without the need for using symbolic links.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | filename | [see text] | ... |
Restart Type: IN |
If this parameter is specified, then MySQL Cluster Disk Data data files and undo log
files are placed in the indicated directory. This can be overridden for data files, undo
log files, or both, by specifying values for FileSystemPathDataFiles
, FileSystemPathUndoFiles
, or both, as explained for
these parameters. It can also be overridden for data files by specifying a path in the
ADD DATAFILE
clause of a CREATE TABLESPACE
or ALTER TABLESPACE
statement, and for undo log files by
specifying a path in the ADD UNDOFILE
clause of a CREATE
LOGFILE GROUP
or ALTER LOGFILE GROUP
statement. If FileSystemPathDD
is not specified, then FileSystemPath
is used.
If a FileSystemPathDD
directory is specified for a given data
node (including the case where the parameter is specified in the [ndbd
default]
section of the config.ini
file), then
starting that data node with --initial
causes all files in
the directory to be deleted.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | filename | [see text] | ... |
Restart Type: IN |
If this parameter is specified, then MySQL Cluster Disk Data data files are placed in
the indicated directory. This overrides any value set for FileSystemPathDD
. This parameter can be overridden
for a given data file by specifying a path in the ADD
DATAFILE
clause of a CREATE TABLESPACE
or ALTER TABLESPACE
statement used to create that data
file. If FileSystemPathDataFiles
is not specified, then FileSystemPathDD
is used (or FileSystemPath
, if FileSystemPathDD
has also not been set).
If a FileSystemPathDataFiles
directory is specified for a
given data node (including the case where the parameter is specified in the [ndbd default]
section of the config.ini
file), then starting that data node with --initial
causes
all files in the directory to be deleted.
Effective Version | Type/Units | Default | Range/Values |
---|---|---|---|
NDB 7.3.0 | filename | [see text] | ... |
Restart Type: IN |
If this parameter is specified, then MySQL Cluster Disk Data undo log files are placed
in the indicated directory. This overrides any value set for FileSystemPathDD
. This parameter can be overridden
for a given data file by specifying a path in the ADD UNDO
clause of a CREATE
LOGFILE GROUP
or CREATE LOGFILE GROUP
statement used to create that
data file. If FileSystemPathUndoFiles
is not specified, then FileSystemPathDD
is used (or FileSystemPath
, if FileSystemPathDD
has also not been set).
If a FileSystemPathUndoFiles
directory is specified for a
given data node (including the case where the parameter is specified in the [ndbd default]
section of the config.ini
file), then starting that data node with --initial
causes
all files in the directory to be deleted.
For more information, see Section 17.5.12.1, "MySQL Cluster Disk Data Objects".
Disk Data object creation parameters. The next two parameters enable you—when starting the cluster for the first time—to cause a Disk Data log file group, tablespace, or both, to be created without the use of SQL statements.
This parameter can be used to specify a log file group that is created when performing
an initial start of the cluster. InitialLogFileGroup
is specified as shown here:
InitialLogFileGroup = [name=name
;] [undo_buffer_size=size
;]file-specification-list
file-specification-list
:file-specification
[;file-specification
[; ...]]file-specification
:filename
:size
The name
of the log file group is optional and defaults to
DEFAULT-LG
. The undo_buffer_size
is also optional; if omitted, it
defaults to 64M
. Each file-specification
corresponds to an undo log file, and at least one must be specified in the file-specification-list
. Undo log files
are placed according to any values that have been set for FileSystemPath
, FileSystemPathDD
, and FileSystemPathUndoFiles
, just as if they had been
created as the result of a CREATE LOGFILE GROUP
or ALTER LOGFILE GROUP
statement.
Consider the following:
InitialLogFileGroup = name=LG1; undo_buffer_size=128M; undo1.log:250M; undo2.log:150M
This is equivalent to the following SQL statements:
CREATE LOGFILE GROUP LG1 ADD UNDOFILE 'undo1.log' INITIAL_SIZE 250M UNDO_BUFFER_SIZE 128M ENGINE NDBCLUSTER;ALTER LOGFILE GROUP LG1 ADD UNDOFILE 'undo2.log' INITIAL_SIZE 150M ENGINE NDBCLUSTER;
This logfile group is created when the data nodes are started with --initial
.
Resources for the initial log file group are taken from the global memory pool whose
size is determined by the value of the SharedGlobalMemory
data node configuration parameter;
if this parameter is set too low and the values set in InitialLogFileGroup
for the logfile group's initial size or undo buffer size are too high, the cluster may
fail to create the default log file group when starting, or fail to start altogether.
This parameter, if used, should always be set in the [ndbd
default]
section of the config.ini
file. The
behavior of a MySQL Cluster when different values are set on different data nodes is not
defined.
This parameter can be used to specify a MySQL Cluster Disk Data tablespace that is
created when performing an initial start of the cluster. InitialTablespace
is specified as shown here:
InitialTablespace = [name=name
;] [extent_size=size
;]file-specification-list
The name
of the tablespace is optional and defaults to
DEFAULT-TS
. The extent_size
is
also optional; it defaults to 1M
. The file-specification-list
uses the same
syntax as shown with the InitialLogfileGroup
parameter, the only difference
being that each file-specification
used with
InitialTablespace
corresponds to a data file. At
least one must be specified in the file-specification-list
. Data files are
placed according to any values that have been set for FileSystemPath
, FileSystemPathDD
, and FileSystemPathDataFiles
, just as if they had been
created as the result of a CREATE TABLESPACE
or ALTER TABLESPACE
statement.
For example, consider the following line specifying InitialTablespace
in the [ndbd
default]
section of the config.ini
file (as with InitialLogfileGroup
, this parameter should always be set
in the [ndbd default]
section, as the behavior of a MySQL
Cluster when different values are set on different data nodes is not defined):
InitialTablespace = name=TS1; extent_size=8M; data1.dat:2G; data2.dat:4G
This is equivalent to the following SQL statements:
CREATE TABLESPACE TS1 ADD DATAFILE 'data1.dat' EXTENT_SIZE 8M INITIAL_SIZE 2G ENGINE NDBCLUSTER;ALTER TABLESPACE TS1 ADD UNDOFILE 'data2.dat' INITIAL_SIZE 4G ENGINE NDBCLUSTER;
This tablespace is created when the data nodes are started with --initial
,
and can be used whenever creating MySQL Cluster Disk Data tables thereafter.
Disk Data and GCP Stop errors. Errors encountered when using Disk Data tables such as Node nodeid
killed this node because GCP stop was detected
(error 2303) are often referred to as "GCP stop errors".
Such errors occur when the redo log is not flushed to disk quickly enough; this is usually due to slow disks and
insufficient disk throughput.
You can help prevent these errors from occurring by using faster disks, and by placing Disk Data files on a
separate disk from the data node file system. Reducing the value of TimeBetweenGlobalCheckpoints
tends to decrease the amount of data to be
written for each global checkpoint, and so may provide some protection against redo log buffer overflows when
trying to write a global checkpoint; however, reducing this value also permits less time in which to write the
GCP, so this must be done with caution.
In addition to the considerations given for DiskPageBufferMemory
as explained previously, it is also very important that the
DiskIOThreadPool
configuration parameter be set correctly; having DiskIOThreadPool
set too high is very likely to cause GCP stop errors (Bug #37227).
GCP stops can be caused by save or commit timeouts; the TimeBetweenEpochsTimeout
data node configuration parameter determines the timeout
for commits. However, it is possible to disable both types of timeouts by setting this parameter to 0.
Parameters for configuring send buffer memory allocation. Send buffer memory is allocated dynamically
from a memory pool shared between all transporters, which means that the size of the send buffer can be adjusted
as necessary. (Previously, the NDB kernel used a fixed-size send buffer for every node in the cluster, which was
allocated when the node started and could not be changed while the node was running.) The TotalSendBufferMemory
and OverLoadLimit
data node configuration parameters permit the setting of limits
on this memory allocation. For more information about the use of these parameters (as well as SendBufferMemory
), see Section
17.3.2.12, "Configuring MySQL Cluster Send Buffer Parameters".
This parameter specifies the amount of transporter send buffer memory to allocate in addition to any
set using TotalSendBufferMemory
, SendBufferMemory
, or both.
This parameter is available beginning with MySQL Cluster NDB 6.4.0. It is used to determine the total amount of memory to allocate on this node for shared send buffer memory among all configured transporters.
If this parameter is set, its minimum permitted value is 256KB; the maxmimum is 4294967039.
This parameter is present in NDBCLUSTER
source code beginning with MySQL Cluster NDB 6.4.0. However, it is not currently enabled.
This parameter was deprecated in MySQL Cluster NDB 7.2, and is subject to removal in a future release of MySQL Cluster (Bug #11760629, Bug #53053).
For more detailed information about the behavior and use of TotalSendBufferMemory
and about configuring send buffer memory parameters in
MySQL Cluster, see Section 17.3.2.12,
"Configuring MySQL Cluster Send Buffer Parameters".
See also Section 17.5.13, "Adding MySQL Cluster Data Nodes Online".
Redo log over-commit handling. It is possible to
control a data node's handling of operations when too much time is taken flushing redo logs to disk. This occurs
when a given redo log flush takes longer than RedoOverCommitLimit
seconds, more than RedoOverCommitCounter
times, causing any pending transactions to be aborted.
When this happens, the API node that sent the transaction can handle the operations that should have been
committed either by queuing the operations and re-trying them, or by aborting them, as determined by DefaultOperationRedoProblemAction
. The data node configuration parameters for
setting the timeout and number of times it may be exceeded before the API node takes this action are described
in the following list:
When RedoOverCommitLimit
is exceeded when trying to write a given redo log
to disk this many times or more, any transactions that were not committed as a result are aborted,
and an API node where any of these transactions originated handles the operations making up those
transactions according to its value for DefaultOperationRedoProblemAction
(by either queuing the
operations to be re-tried, or aborting them).
RedoOverCommitCounter
defaults to 3. Set it to 0 to disable the limit.
This parameter sets an upper limit in seconds for trying to write a given redo log to disk before
timing out. The number of times the data node tries to flush this redo log, but takes longer than
RedoOverCommitLimit
, is kept and compared with RedoOverCommitCounter
, and when flushing takes too long more
times than the value of that parameter, any transactions that were not committed as a result of the
flush timeout are aborted. When this occurs, the API node where any of these transactions originated
handles the operations making up those transactions according to its DefaultOperationRedoProblemAction
setting (it either queues the
operations to be re-tried, or aborts them).
By default, RedoOverCommitLimit
is 20 seconds. Set to 0 to disable
checking for redo log flush timeouts. This parameter was added in MySQL Cluster NDB 7.1.10.
Controlling restart attempts. It is possible to exercise finely-grained control over restart attempts by
data nodes when they fail to start using the MaxStartFailRetries
and StartFailRetryDelay
data node configuration parameters.
MaxStartFailRetries
limits the total number of retries made before giving up on starting the data node, StartFailRetryDelay
sets the number of seconds between retry attempts. These
parameters are described in more detail in the next few paragraphs.
Use this parameter to set the number of seconds between restart attempts by the data node in the event on failure on startup. The default is 0 (no delay).
This parameter is ignored unless StopOnError
is equal to 0.
Use this parameter to limit the number restart attempts made by the data node in the event that it fails on startup. The default is 3 attempts.
This parameter is ignored unless StopOnError
is equal to 0.