Spec-Zone .ru
спецификации, руководства, описания, API
|
Storage engines collect statistics about tables for use by the optimizer. Table statistics are based on value groups, where a value group is a set of rows with the same key prefix value. For optimizer purposes, an important statistic is the average value group size.
MySQL uses the average value group size in the following ways:
To estimate how may rows must be read for each ref
access
To estimate how many row a partial join will produce; that is, the number of rows that an operation of this form will produce:
(...) JOINtbl_name
ONtbl_name
.key
=expr
As the average value group size for an index increases, the index is less useful for those two purposes because the average number of rows per lookup increases: For the index to be good for optimization purposes, it is best that each index value target a small number of rows in the table. When a given index value yields a large number of rows, the index is less useful and MySQL is less likely to use it.
The average value group size is related to table cardinality, which is the number of value groups. The SHOW
INDEX
statement displays a cardinality value based on N/S
,
where N
is the number of rows in the table and S
is the average value group size. That ratio yields an
approximate number of value groups in the table.
For a join based on the <=>
comparison operator, NULL
is not treated differently from any other value: NULL <=>
NULL
, just as
for any other N
<=> N
N
.
However, for a join based on the =
operator, NULL
is
different from non-NULL
values:
is
not true when expr1
= expr2
expr1
or expr2
(or both) are NULL
. This affects ref
accesses for comparisons of the form
: MySQL will not access the table if the current value
of tbl_name.key
= expr
expr
is NULL
, because the comparison
cannot be true.
For =
comparisons, it does not matter how many NULL
values are in the table. For optimization purposes, the relevant value is the average size of the non-NULL
value groups. However, MySQL does not currently enable that average size to
be collected or used.
For InnoDB
and MyISAM
tables, you have some control
over collection of table statistics by means of the innodb_stats_method
and myisam_stats_method
system variables, respectively. These variables have
three possible values, which differ as follows:
When the variable is set to nulls_equal
, all NULL
values are treated as identical (that is, they all form a single value
group).
If the NULL
value group size is much higher than the average non-NULL
value group size, this method skews the average value group size
upward. This makes index appear to the optimizer to be less useful than it really is for joins that
look for non-NULL
values. Consequently, the nulls_equal
method may cause the optimizer not to use the index for ref
accesses when it should.
When the variable is set to nulls_unequal
, NULL
values are not considered the same. Instead, each NULL
value forms a separate value group of size 1.
If you have many NULL
values, this method skews the average value group
size downward. If the average non-NULL
value group size is large,
counting NULL
values each as a group of size 1 causes the optimizer to
overestimate the value of the index for joins that look for non-NULL
values. Consequently, the nulls_unequal
method may cause the optimizer
to use this index for ref
lookups when other methods may be better.
When the variable is set to nulls_ignored
, NULL
values are ignored.
If you tend to use many joins that use <=>
rather than =
,
NULL
values are not special in comparisons and one NULL
is equal to another. In this case, nulls_equal
is the appropriate statistics method.
The innodb_stats_method
system variable has a global value; the myisam_stats_method
system variable has both global and session values.
Setting the global value affects statistics collection for tables from the corresponding storage engine. Setting
the session value affects statistics collection only for the current client connection. This means that you can
force a table's statistics to be regenerated with a given method without affecting other clients by setting the
session value of myisam_stats_method
.
To regenerate table statistics, you can use any of the following methods:
Change the table to cause its statistics to go out of date (for example, insert a
row and then delete it), and then set myisam_stats_method
and issue an ANALYZE TABLE
statement
Some caveats regarding the use of innodb_stats_method
and myisam_stats_method
:
You can force table statistics to be collected explicitly, as just described.
However, MySQL may also collect statistics automatically. For example, if during the course of executing
statements for a table, some of those statements modify the table, MySQL may collect statistics. (This
may occur for bulk inserts or deletes, or some ALTER TABLE
statements, for example.) If this happens, the statistics
are collected using whatever value innodb_stats_method
or myisam_stats_method
has at the time. Thus, if you collect statistics
using one method, but the system variable is set to the other method when a table's statistics are
collected automatically later, the other method will be used.
There is no way to tell which method was used to generate statistics for a given table.
These variables apply only to InnoDB
and MyISAM
tables. Other storage engines have only one method for collecting
table statistics. Usually it is closer to the nulls_equal
method.