Spec-Zone .ru
спецификации, руководства, описания, API

8.13.16.1. Optimizing Subqueries with Semi-Join Transformations

As of MySQL 5.6.5, the optimizer uses semi-join strategies to improve subquery execution, as described in this section.

For an inner join between two tables, the join returns a row from one table as many times as there are matches in the other table. But for some questions, the only information that matters is whether there is a match, not the number of matches. Suppose that there are tables named class and roster that list classes in a course curriculum and class rosters (students enrolled in each class), respectively. To list the classes that actually have students enrolled, you could use this join:

SELECT class.class_num, class.class_nameFROM class INNER JOIN rosterWHERE class.class_num = roster.class_num;

However, the result lists each class once for each enrolled student. For the question being asked, this is unnecessary duplication of information.

Assuming that class_num is a primary key in the class table, duplicate suppression could be achieved by using SELECT DISTINCT, but it is inefficient to generate all matching rows first only to eliminate duplicates later.

The same duplicate-free result can be obtained by using a subquery:

SELECT class_num, class_nameFROM classWHERE class_num IN (SELECT class_num FROM roster);

Here, the optimizer can recognize that the IN clause requires the subquery to return only one instance of each class number from the roster table. In this case, the query can be executed as a semi-join—that is, an operation that returns only one instance of each row in class that is matched by rows in roster.

Before MySQL 5.6.6, the outer query specification was limited to simple table scans or inner joins using comma syntax, and view references were not possible. As of 5.6.6, outer join and inner join syntax is permitted in the outer query specification, and the restriction that table references must be base tables has been lifted.

In MySQL, a subquery must satisfy these criteria to be handled as a semi-join:

The subquery may be correlated or uncorrelated. DISTINCT is permitted, as is LIMIT unless ORDER BY is also used.

If a subquery meets the preceding criteria, MySQL converts it to a semi-join and makes a cost-based choice from these strategies:

Each of these strategies except Duplicate Weedout can be enabled or disabled using the optimizer_switch system variable. The semijoin flag controls whether semi-joins are used. If it is set to on, the firstmatch, loosescan, and materialization flags enable finer control over the permitted semi-join strategies. These flags are on by default. See Section 8.8.5.2, "Controlling Switchable Optimizations".

The use of semi-join strategies is indicated in EXPLAIN output as follows: