Spec-Zone .ru
спецификации, руководства, описания, API
|
As of MySQL 5.6.5, the optimizer uses semi-join strategies to improve subquery execution, as described in this section.
For an inner join between two tables, the join returns a row from one table as many times as there are matches
in the other table. But for some questions, the only information that matters is whether there is a match, not
the number of matches. Suppose that there are tables named class
and roster
that list classes in a course curriculum and class rosters (students
enrolled in each class), respectively. To list the classes that actually have students enrolled, you could use
this join:
SELECT class.class_num, class.class_nameFROM class INNER JOIN rosterWHERE class.class_num = roster.class_num;
However, the result lists each class once for each enrolled student. For the question being asked, this is unnecessary duplication of information.
Assuming that class_num
is a primary key in the class
table, duplicate suppression could be achieved by using SELECT
DISTINCT
, but it is inefficient to generate all matching rows first only to eliminate duplicates
later.
The same duplicate-free result can be obtained by using a subquery:
SELECT class_num, class_nameFROM classWHERE class_num IN (SELECT class_num FROM roster);
Here, the optimizer can recognize that the IN
clause requires the subquery to
return only one instance of each class number from the roster
table. In this case,
the query can be executed as a semi-join—that is, an operation that returns only one
instance of each row in class
that is matched by rows in roster
.
Before MySQL 5.6.6, the outer query specification was limited to simple table scans or inner joins using comma syntax, and view references were not possible. As of 5.6.6, outer join and inner join syntax is permitted in the outer query specification, and the restriction that table references must be base tables has been lifted.
In MySQL, a subquery must satisfy these criteria to be handled as a semi-join:
It must be an IN
(or =ANY
)
subquery that appears at the top level of the WHERE
or ON
clause, possibly as a term in an AND
expression. For example:
SELECT ...FROM ot1, ...WHERE (oe1, ...) IN (SELECT ie1, ... FROM it1, ... WHERE ...);
Here, ot_
and i
it_
represent tables in the
outer and inner parts of the query, and i
oe_
and i
ie_
represent expressions that refer to columns
in the outer and inner tables. i
It must not contain a GROUP BY
or HAVING
clause or aggregate functions.
It must not have ORDER BY
with LIMIT
.
The number of outer and inner tables together must be less than the maximum number of tables permitted in a join.
The subquery may be correlated or uncorrelated. DISTINCT
is permitted, as is LIMIT
unless ORDER BY
is also used.
If a subquery meets the preceding criteria, MySQL converts it to a semi-join and makes a cost-based choice from these strategies:
Convert the subquery to a join, or use table pullout and run the query as an inner join between subquery tables and outer tables. Table pullout pulls a table out from the subquery to the outer query.
Duplicate Weedout: Run the semi-join as if it was a join and remove duplicate records using a temporary table.
FirstMatch: When scanning the inner tables for row combinations and there are multiple instances of a given value group, choose one rather than returning them all. This "shortcuts" scanning and eliminates production of unnecessary rows.
LooseScan: Scan a subquery table using an index that enables a single value to be chosen from each subquery's value group.
Materialize the subquery into a temporary table with an index and use the temporary table to perform a join. The index is used to remove duplicates. The index might also be used later for lookups when joining the temporary table with the outer tables; if not, the table is scanned.
Each of these strategies except Duplicate Weedout can be enabled or disabled using the optimizer_switch
system variable. The semijoin
flag
controls whether semi-joins are used. If it is set to on
, the firstmatch
, loosescan
, and materialization
flags enable finer control over the permitted semi-join strategies. These flags are on
by default. See Section
8.8.5.2, "Controlling Switchable Optimizations".
The use of semi-join strategies is indicated in EXPLAIN
output as follows:
Semi-joined tables show up in the outer select. EXPLAIN EXTENDED
plus SHOW WARNINGS
shows the rewritten query, which displays the semi-join
structure. From this you can get an idea about which tables were pulled out of the semi-join. If a
subquery was converted to a semi-join, you will see that the subquery predicate is gone and its tables
and WHERE
clause were merged into the outer query join list and WHERE
clause.
Temporary table use for Duplicate Weedout is indicated by Start
temporary
and End temporary
in the Extra
column. Tables that were not pulled out and are in the range of EXPLAIN
output rows covered by Start temporary
and End
temporary
will have their rowid
in the temporary table.
FirstMatch(
in the tbl_name
)Extra
column indicates join shortcutting.
LooseScan(
in the m
..n
)Extra
column
indicates use of the LooseScan strategy. m
and n
are key part numbers.
As of MySQL 5.6.7, temporary table use for materialization is indicated by rows
with a select_type
value of MATERIALIZED
and
rows with a table
value of <subquery
. N
>
Before MySQL 5.6.7, temporary table use for materialization is indicated in the Extra
column by Materialize
if a single
table is used, or by Start materialize
and End
materialize
if multiple tables are used. If Scan
is present,
no temporary table index is used for table reads. Otherwise, an index lookup is used.