Spec-Zone .ru
спецификации, руководства, описания, API
|
Certain optimizations are applicable to comparisons that use the IN
operator to
test subquery results (or that use =ANY
, which is equivalent). This section
discusses these optimizations, particularly with regard to the challenges that NULL
values present. The last part of the discussion includes suggestions on what you can do to help the optimizer.
Consider the following subquery comparison:
outer_expr
IN (SELECTinner_expr
FROM ... WHEREsubquery_where
)
MySQL evaluates queries "from outside to inside." That is,
it first obtains the value of the outer expression outer_expr
, and
then runs the subquery and captures the rows that it produces.
A very useful optimization is to "inform" the subquery
that the only rows of interest are those where the inner expression inner_expr
is equal to outer_expr
. This is done by pushing down an appropriate equality
into the subquery's WHERE
clause. That is, the comparison is converted to this:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where
ANDouter_expr
=inner_expr
)
After the conversion, MySQL can use the pushed-down equality to limit the number of rows that it must examine when evaluating the subquery.
More generally, a comparison of N
values to a subquery that returns
N
-value rows is subject to the same conversion. If oe_i
and ie_i
represent
corresponding outer and inner expression values, this subquery comparison:
(oe_1
, ...,oe_N
) IN (SELECTie_1
, ...,ie_N
FROM ... WHEREsubquery_where
)
Becomes:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where
ANDoe_1
=ie_1
AND ... ANDoe_N
=ie_N
)
The following discussion assumes a single pair of outer and inner expression values for simplicity.
The conversion just described has its limitations. It is valid only if we ignore possible NULL
values. That is, the "pushdown" strategy works as long as
both of these two conditions are true:
When either or both of those conditions do not hold, optimization is more complex.
Suppose that outer_expr
is known to be a non-NULL
value but the subquery does not produce a row such that outer_expr
=
inner_expr
. Then
evaluates as follows:outer_expr
IN (SELECT ...)
In this situation, the approach of looking for rows with
is no longer valid. It is necessary to look for
such rows, but if none are found, also look for rows where outer_expr
= inner_expr
inner_expr
is NULL
. Roughly speaking, the subquery can be converted to:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where
AND (outer_expr
=inner_expr
ORinner_expr
IS NULL))
The need to evaluate the extra IS
NULL
condition is why MySQL has the ref_or_null
access method:
mysql>EXPLAIN
->SELECT
->outer_expr
IN (SELECT t2.maybe_null_keyFROM t2, t3 WHERE ...)
-> FROM t1;*************************** 1. row *************************** id: 1 select_type: PRIMARY table: t1...*************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: t2 type: ref_or_nullpossible_keys: maybe_null_key key: maybe_null_key key_len: 5 ref: func rows: 2 Extra: Using where; Using index...
The unique_subquery
and index_subquery
subquery-specific access methods also have "or NULL
" variants.
However, they are not visible in EXPLAIN
output, so you must use EXPLAIN
EXTENDED
followed by SHOW
WARNINGS
(note the checking NULL
in the warning message):
mysql>EXPLAIN EXTENDED
->SELECT
*************************** 1. row *************************** id: 1 select_type: PRIMARY table: t1...*************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: t2 type: index_subquerypossible_keys: maybe_null_key key: maybe_null_key key_len: 5 ref: func rows: 2 Extra: Using indexmysql>outer_expr
IN (SELECT maybe_null_key FROM t2) FROM t1\GSHOW WARNINGS\G
*************************** 1. row *************************** Level: Note Code: 1003Message: select (`test`.`t1`.`outer_expr`, (((`test`.`t1`.`outer_expr`) in t2 on maybe_null_key checking NULL))) AS `outer_expr IN (SELECT maybe_null_key FROM t2)` from `test`.`t1`
The additional OR ... IS NULL
condition makes query execution slightly more
complicated (and some optimizations within the subquery become inapplicable), but generally this is tolerable.
The situation is much worse when outer_expr
can be NULL
. According to the SQL interpretation of NULL
as
"unknown value," NULL IN (SELECT
should evaluate to:inner_expr
...)
For proper evaluation, it is necessary to be able to check whether the SELECT
has produced any rows at all, so
cannot be pushed down into the subquery. This is a
problem, because many real world subqueries become very slow unless the equality can be pushed down. outer_expr
= inner_expr
Essentially, there must be different ways to execute the subquery depending on the value of outer_expr
.
The optimizer chooses SQL compliance over speed, so it accounts for the possibility that outer_expr
might be NULL
.
If outer_expr
is NULL
, to evaluate the
following expression, it is necessary to run the SELECT
to determine whether it produces any rows:
NULL IN (SELECTinner_expr
FROM ... WHEREsubquery_where
)
It is necessary to run the original SELECT
here, without any pushed-down equalities of the kind mentioned earlier.
On the other hand, when outer_expr
is not NULL
, it is absolutely essential that this comparison:
outer_expr
IN (SELECTinner_expr
FROM ... WHEREsubquery_where
)
be converted to this expression that uses a pushed-down condition:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where
ANDouter_expr
=inner_expr
)
Without this conversion, subqueries will be slow. To solve the dilemma of whether to push down or not push down conditions into the subquery, the conditions are wrapped in "trigger" functions. Thus, an expression of the following form:
outer_expr
IN (SELECTinner_expr
FROM ... WHEREsubquery_where
)
is converted into:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where
AND trigcond(outer_expr
=inner_expr
))
More generally, if the subquery comparison is based on several pairs of outer and inner expressions, the conversion takes this comparison:
(oe_1
, ...,oe_N
) IN (SELECTie_1
, ...,ie_N
FROM ... WHEREsubquery_where
)
and converts it to this expression:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where
AND trigcond(oe_1
=ie_1
) AND ... AND trigcond(oe_N
=ie_N
) )
Each trigcond(
is a special function
that evaluates to the following values:X
)
X
when the "linked" outer expression oe_i
is not NULL
TRUE
when the "linked"
outer expression oe_i
is NULL
Note that trigger functions are not triggers of the kind that you create
with CREATE TRIGGER
.
Equalities that are wrapped into trigcond()
functions are not first class
predicates for the query optimizer. Most optimizations cannot deal with predicates that may be turned on and off
at query execution time, so they assume any trigcond(
to be an unknown function and ignore it. At the moment,
triggered equalities can be used by those optimizations:X
)
Reference optimizations: trigcond(
can be used to construct X
=Y
[OR Y
IS NULL])ref
, eq_ref
, or ref_or_null
table accesses.
Index lookup-based subquery execution engines: trigcond(
can be
used to construct X
=Y
)unique_subquery
or index_subquery
accesses.
Table-condition generator: If the subquery is a join of several tables, the triggered condition will be checked as soon as possible.
When the optimizer uses a triggered condition to create some kind of index lookup-based access (as for the first
two items of the preceding list), it must have a fallback strategy for the case when the condition is turned
off. This fallback strategy is always the same: Do a full table scan. In EXPLAIN
output, the fallback shows up as Full scan on
NULL key
in the Extra
column:
mysql>EXPLAIN SELECT t1.col1,
->t1.col1 IN (SELECT t2.key1 FROM t2 WHERE t2.col2=t1.col2) FROM t1\G
*************************** 1. row *************************** id: 1 select_type: PRIMARY table: t1 ...*************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: t2 type: index_subquerypossible_keys: key1 key: key1 key_len: 5 ref: func rows: 2 Extra: Using where; Full scan on NULL key
If you run EXPLAIN EXTENDED
followed by SHOW WARNINGS
, you can see the triggered condition:
*************************** 1. row *************************** Level: Note Code: 1003Message: select `test`.`t1`.`col1` AS `col1`, <in_optimizer>(`test`.`t1`.`col1`, <exists>(<index_lookup>(<cache>(`test`.`t1`.`col1`) in t2 on key1 checking NULL where (`test`.`t2`.`col2` = `test`.`t1`.`col2`) having trigcond(<is_not_null_test>(`test`.`t2`.`key1`))))) AS `t1.col1 IN (select t2.key1 from t2 where t2.col2=t1.col2)` from `test`.`t1`
The use of triggered conditions has some performance implications. A NULL IN (SELECT
...)
expression now may cause a full table scan (which is slow) when it previously did not. This is the
price paid for correct results (the goal of the trigger-condition strategy was to improve compliance and not
speed).
For multiple-table subqueries, execution of NULL IN (SELECT ...)
will be
particularly slow because the join optimizer does not optimize for the case where the outer expression is NULL
. It assumes that subquery evaluations with NULL
on the left side are very rare, even if there are statistics that indicate otherwise. On the other hand, if the
outer expression might be NULL
but never actually is, there is no performance
penalty.
To help the query optimizer better execute your queries, use these tips:
Declare a column as NOT NULL
if it really is. (This
also helps other aspects of the optimizer by simplifying condition testing for the column.)
If you do not need to distinguish a NULL
from FALSE
subquery result, you can easily avoid the slow execution path. Replace
a comparison that looks like this:
outer_expr
IN (SELECTinner_expr
FROM ...)
with this expression:
(outer_expr
IS NOT NULL) AND (outer_expr
IN (SELECTinner_expr
FROM ...))
Then NULL IN (SELECT ...)
will never be evaluated because MySQL stops
evaluating AND
parts as soon as the expression result is clear.
The subquery_materialization_cost_based
enables control over the choice between
subquery materialization and IN -> EXISTS
subquery transformation. See Section 8.8.5.2, "Controlling Switchable
Optimizations".