Spec-Zone .ru
спецификации, руководства, описания, API
|
Subquery optimization for IN
is not as effective as
for the =
operator or for the IN(
operator.
value_list
)
A typical case for poor IN
subquery performance is when the subquery
returns a small number of rows but the outer query returns a large number of rows to be compared to
the subquery result.
The problem is that, for a statement that uses an IN
subquery, the
optimizer rewrites it as a correlated subquery. Consider the following statement that uses an
uncorrelated subquery:
SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);
The optimizer rewrites the statement to a correlated subquery:
SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);
If the inner and outer queries return M
and N
rows, respectively, the execution time becomes on the
order of O(M
×N
),
rather than O(M
+N
) as it would be for an uncorrelated subquery.
An implication is that an IN
subquery can be much slower than a query
written using an IN(
operator that lists the same
values that the subquery would return. value_list
)
In general, you cannot modify a table and select from the same table in a subquery. For example, this limitation applies to statements of the following forms:
DELETE FROM t WHERE ... (SELECT ... FROM t ...);UPDATE t ... WHERE col = (SELECT ... FROM t ...);{INSERT|REPLACE} INTO t (SELECT ... FROM t ...);
Exception: The preceding prohibition does not apply if you are using a subquery for the modified
table in the FROM
clause. Example:
UPDATE t ... WHERE col = (SELECT * FROM (SELECT ... FROM t...) AS _t ...);
Here the result from the subquery in the FROM
clause is stored as a
temporary table, so the relevant rows in t
have already been selected
by the time the update to t
takes place.
Row comparison operations are only partially supported:
For
,
expr
[NOT] IN subquery
expr
can be an n
-tuple (specified using row constructor
syntax) and the subquery can return rows of n
-tuples.
The permitted syntax is therefore more specifically expressed as row_constructor
[NOT] IN table_subquery
For
, expr
op
{ALL|ANY|SOME} subquery
expr
must be a scalar value and the subquery must
be a column subquery; it cannot return multiple-column rows.
In other words, for a subquery that returns rows of n
-tuples,
this is supported:
(expr_1
, ...,expr_n
) [NOT] INtable_subquery
But this is not supported:
(expr_1
, ...,expr_n
)op
{ALL|ANY|SOME}subquery
The reason for supporting row comparisons for IN
but not for the others
is that IN
is implemented by rewriting it as a sequence of =
comparisons and AND
operations. This approach cannot be used for ALL
, ANY
, or SOME
.
Subqueries in the FROM
clause cannot be correlated
subqueries. They are materialized in whole (evaluated to produce a result set) during query execution,
so they cannot be evaluated per row of the outer query. Before MySQL 5.6.3, materialization takes place
before evaluation of the outer query. As of 5.6.3, the optimizer delays materialization until the result
is needed, which may permit materialization to be avoided. See Section
8.13.16.3, "Optimizing Subqueries in the FROM
Clause (Derived Tables)".
MySQL does not support LIMIT
in subqueries for certain
subquery operators:
mysql>SELECT * FROM t1
->WHERE s1 IN (SELECT s2 FROM t2 ORDER BY s1 LIMIT 1);
ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
The optimizer is more mature for joins than for subqueries, so in many cases a statement that uses a subquery can be executed more efficiently if you rewrite it as a join.
An exception occurs for the case where an IN
subquery can be rewritten
as a SELECT DISTINCT
join. Example:
SELECT col FROM t1 WHERE id_col IN (SELECT id_col2 FROM t2 WHERE condition
);
That statement can be rewritten as follows:
SELECT DISTINCT col FROM t1, t2 WHERE t1.id_col = t2.id_col AND condition
;
MySQL permits a subquery to refer to a stored function that has data-modifying side
effects such as inserting rows into a table. For example, if f()
inserts
rows, the following query can modify data:
SELECT ... WHERE x IN (SELECT f() ...);
This behavior is an extension to the SQL standard. In MySQL, it can produce indeterminate results
because f()
might be executed a different number of times for different
executions of a given query depending on how the optimizer chooses to handle it.
For statement-based or mixed-format replication, one implication of this indeterminism is that such a query can produce different results on the master and its slaves.
Before MySQL 5.6.3, a subquery in the FROM
clause is
evaluated by materializing the result into a temporary table, and this table does not use indexes. As of
5.6.3, the optimizer creates an index on the materialized table if this will result in faster query
execution. See Section
8.13.16.3, "Optimizing Subqueries in the FROM
Clause (Derived Tables)".