Spec-Zone .ru
спецификации, руководства, описания, API
|
The syntax for expressing joins permits nested joins. The following discussion refers to the join syntax
described in Section 13.2.9.2, "JOIN
Syntax".
The syntax of table_factor
is extended in comparison with the SQL
Standard. The latter accepts only table_reference
, not a list of them
inside a pair of parentheses. This is a conservative extension if we consider each comma in a list of table_reference
items as equivalent to an inner join. For example:
SELECT * FROM t1 LEFT JOIN (t2, t3, t4) ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)
is equivalent to:
SELECT * FROM t1 LEFT JOIN (t2 CROSS JOIN t3 CROSS JOIN t4) ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)
In MySQL, CROSS JOIN
is a syntactic equivalent to INNER
JOIN
(they can replace each other). In standard SQL, they are not equivalent. INNER
JOIN
is used with an ON
clause; CROSS JOIN
is
used otherwise.
In general, parentheses can be ignored in join expressions containing only inner join operations. After removing parentheses and grouping operations to the left, the join expression:
t1 LEFT JOIN (t2 LEFT JOIN t3 ON t2.b=t3.b OR t2.b IS NULL) ON t1.a=t2.a
transforms into the expression:
(t1 LEFT JOIN t2 ON t1.a=t2.a) LEFT JOIN t3 ON t2.b=t3.b OR t2.b IS NULL
Yet, the two expressions are not equivalent. To see this, suppose that the tables t1
, t2
, and t3
have the
following state:
Table t1
contains rows (1)
, (2)
Table t2
contains row (1,101)
Table t3
contains row (101)
In this case, the first expression returns a result set including the rows (1,1,101,101)
, (2,NULL,NULL,NULL)
, whereas the second
expression returns the rows (1,1,101,101)
, (2,NULL,NULL,101)
:
mysql>SELECT *
->FROM t1
->LEFT JOIN
->(t2 LEFT JOIN t3 ON t2.b=t3.b OR t2.b IS NULL)
->ON t1.a=t2.a;
+------+------+------+------+| a | a | b | b |+------+------+------+------+| 1 | 1 | 101 | 101 || 2 | NULL | NULL | NULL |+------+------+------+------+mysql>SELECT *
->FROM (t1 LEFT JOIN t2 ON t1.a=t2.a)
->LEFT JOIN t3
->ON t2.b=t3.b OR t2.b IS NULL;
+------+------+------+------+| a | a | b | b |+------+------+------+------+| 1 | 1 | 101 | 101 || 2 | NULL | NULL | 101 |+------+------+------+------+
In the following example, an outer join operation is used together with an inner join operation:
t1 LEFT JOIN (t2, t3) ON t1.a=t2.a
That expression cannot be transformed into the following expression:
t1 LEFT JOIN t2 ON t1.a=t2.a, t3.
For the given table states, the two expressions return different sets of rows:
mysql>SELECT *
->FROM t1 LEFT JOIN (t2, t3) ON t1.a=t2.a;
+------+------+------+------+| a | a | b | b |+------+------+------+------+| 1 | 1 | 101 | 101 || 2 | NULL | NULL | NULL |+------+------+------+------+mysql>SELECT *
->FROM t1 LEFT JOIN t2 ON t1.a=t2.a, t3;
+------+------+------+------+| a | a | b | b |+------+------+------+------+| 1 | 1 | 101 | 101 || 2 | NULL | NULL | 101 |+------+------+------+------+
Therefore, if we omit parentheses in a join expression with outer join operators, we might change the result set for the original expression.
More exactly, we cannot ignore parentheses in the right operand of the left outer join operation and in the left operand of a right join operation. In other words, we cannot ignore parentheses for the inner table expressions of outer join operations. Parentheses for the other operand (operand for the outer table) can be ignored.
The following expression:
(t1,t2) LEFT JOIN t3 ON P(t2.b,t3.b)
is equivalent to this expression:
t1, t2 LEFT JOIN t3 ON P(t2.b,t3.b)
for any tables t1,t2,t3
and any condition P
over
attributes t2.b
and t3.b
.
Whenever the order of execution of the join operations in a join expression (join_table
)
is not from left to right, we talk about nested joins. Consider the following queries:
SELECT * FROM t1 LEFT JOIN (t2 LEFT JOIN t3 ON t2.b=t3.b) ON t1.a=t2.a WHERE t1.a > 1SELECT * FROM t1 LEFT JOIN (t2, t3) ON t1.a=t2.a WHERE (t2.b=t3.b OR t2.b IS NULL) AND t1.a > 1
Those queries are considered to contain these nested joins:
t2 LEFT JOIN t3 ON t2.b=t3.bt2, t3
The nested join is formed in the first query with a left join operation, whereas in the second query it is formed with an inner join operation.
In the first query, the parentheses can be omitted: The grammatical structure of the join expression will
dictate the same order of execution for join operations. For the second query, the parentheses cannot be
omitted, although the join expression here can be interpreted unambiguously without them. (In our extended
syntax the parentheses in (t2, t3)
of the second query are required, although
theoretically the query could be parsed without them: We still would have unambiguous syntactical structure for
the query because LEFT JOIN
and ON
would play the role
of the left and right delimiters for the expression (t2,t3)
.)
The preceding examples demonstrate these points:
For join expressions involving only inner joins (and not outer joins), parentheses can be removed. You can remove parentheses and evaluate left to right (or, in fact, you can evaluate the tables in any order).
The same is not true, in general, for outer joins or for outer joins mixed with inner joins. Removal of parentheses may change the result.
Queries with nested outer joins are executed in the same pipeline manner as queries with inner joins. More
exactly, a variation of the nested-loop join algorithm is exploited. Recall by what algorithmic schema the
nested-loop join executes a query. Suppose that we have a join query over 3 tables T1,T2,T3
of the form:
SELECT * FROM T1 INNER JOIN T2 ON P1(T1,T2) INNER JOIN T3 ON P2(T2,T3) WHERE P(T1,T2,T3).
Here, P1(T1,T2)
and P2(T3,T3)
are some join conditions
(on expressions), whereas P(t1,t2,t3)
is a condition over columns of tables T1,T2,T3
.
The nested-loop join algorithm would execute this query in the following manner:
FOR each row t1 in T1 { FOR each row t2 in T2 such that P1(t1,t2) { FOR each row t3 in T3 such that P2(t2,t3) { IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t; } } }}
The notation t1||t2||t3
means "a row constructed by concatenating the columns of rows t1
, t2
, and t3
." In some of the following examples, NULL
where a row name appears means that NULL
is used
for each column of that row. For example, t1||t2||NULL
means "a row constructed by concatenating the columns of rows t1
and t2
, and NULL
for each column of t3
."
Now let's consider a query with nested outer joins:
SELECT * FROM T1 LEFT JOIN (T2 LEFT JOIN T3 ON P2(T2,T3)) ON P1(T1,T2) WHERE P(T1,T2,T3).
For this query, we modify the nested-loop pattern to get:
FOR each row t1 in T1 { BOOL f1:=FALSE; FOR each row t2 in T2 such that P1(t1,t2) { BOOL f2:=FALSE; FOR each row t3 in T3 such that P2(t2,t3) { IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t; } f2=TRUE; f1=TRUE; } IF (!f2) { IF P(t1,t2,NULL) { t:=t1||t2||NULL; OUTPUT t; } f1=TRUE; } } IF (!f1) { IF P(t1,NULL,NULL) { t:=t1||NULL||NULL; OUTPUT t; } }}
In general, for any nested loop for the first inner table in an outer join operation, a flag is introduced that
is turned off before the loop and is checked after the loop. The flag is turned on when for the current row from
the outer table a match from the table representing the inner operand is found. If at the end of the loop cycle
the flag is still off, no match has been found for the current row of the outer table. In this case, the row is
complemented by NULL
values for the columns of the inner tables. The result row is
passed to the final check for the output or into the next nested loop, but only if the row satisfies the join
condition of all embedded outer joins.
In our example, the outer join table expressed by the following expression is embedded:
(T2 LEFT JOIN T3 ON P2(T2,T3))
Note that for the query with inner joins, the optimizer could choose a different order of nested loops, such as this one:
FOR each row t3 in T3 { FOR each row t2 in T2 such that P2(t2,t3) { FOR each row t1 in T1 such that P1(t1,t2) { IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t; } } }}
For the queries with outer joins, the optimizer can choose only such an order where loops for outer tables precede loops for inner tables. Thus, for our query with outer joins, only one nesting order is possible. For the following query, the optimizer will evaluate two different nestings:
SELECT * T1 LEFT JOIN (T2,T3) ON P1(T1,T2) AND P2(T1,T3) WHERE P(T1,T2,T3)
The nestings are these:
FOR each row t1 in T1 { BOOL f1:=FALSE; FOR each row t2 in T2 such that P1(t1,t2) { FOR each row t3 in T3 such that P2(t1,t3) { IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t; } f1:=TRUE } } IF (!f1) { IF P(t1,NULL,NULL) { t:=t1||NULL||NULL; OUTPUT t; } }}
and:
FOR each row t1 in T1 { BOOL f1:=FALSE; FOR each row t3 in T3 such that P2(t1,t3) { FOR each row t2 in T2 such that P1(t1,t2) { IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t; } f1:=TRUE } } IF (!f1) { IF P(t1,NULL,NULL) { t:=t1||NULL||NULL; OUTPUT t; } }}
In both nestings, T1
must be processed in the outer loop because it is used in an
outer join. T2
and T3
are used in an inner join, so
that join must be processed in the inner loop. However, because the join is an inner join, T2
and T3
can be processed in either order.
When discussing the nested-loop algorithm for inner joins, we omitted some details whose impact on the
performance of query execution may be huge. We did not mention so-called "pushed-down" conditions. Suppose that our WHERE
condition P(T1,T2,T3)
can be represented by a conjunctive formula:
P(T1,T2,T2) = C1(T1) AND C2(T2) AND C3(T3).
In this case, MySQL actually uses the following nested-loop schema for the execution of the query with inner joins:
FOR each row t1 in T1 such that C1(t1) { FOR each row t2 in T2 such that P1(t1,t2) AND C2(t2) { FOR each row t3 in T3 such that P2(t2,t3) AND C3(t3) { IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t; } } }}
You see that each of the conjuncts C1(T1)
, C2(T2)
,
C3(T3)
are pushed out of the most inner loop to the most outer loop where it can be
evaluated. If C1(T1)
is a very restrictive condition, this condition pushdown may
greatly reduce the number of rows from table T1
passed to the inner loops. As a
result, the execution time for the query may improve immensely.
For a query with outer joins, the WHERE
condition is to be checked only after it
has been found that the current row from the outer table has a match in the inner tables. Thus, the optimization
of pushing conditions out of the inner nested loops cannot be applied directly to queries with outer joins. Here
we have to introduce conditional pushed-down predicates guarded by the flags that are turned on when a match has
been encountered.
For our example with outer joins with:
P(T1,T2,T3)=C1(T1) AND C(T2) AND C3(T3)
the nested-loop schema using guarded pushed-down conditions looks like this:
FOR each row t1 in T1 such that C1(t1) { BOOL f1:=FALSE; FOR each row t2 in T2 such that P1(t1,t2) AND (f1?C2(t2):TRUE) { BOOL f2:=FALSE; FOR each row t3 in T3 such that P2(t2,t3) AND (f1&&f2?C3(t3):TRUE) { IF (f1&&f2?TRUE:(C2(t2) AND C3(t3))) { t:=t1||t2||t3; OUTPUT t; } f2=TRUE; f1=TRUE; } IF (!f2) { IF (f1?TRUE:C2(t2) && P(t1,t2,NULL)) { t:=t1||t2||NULL; OUTPUT t; } f1=TRUE; } } IF (!f1 && P(t1,NULL,NULL)) { t:=t1||NULL||NULL; OUTPUT t; }}
In general, pushed-down predicates can be extracted from join conditions such as P1(T1,T2)
and P(T2,T3)
. In this case, a pushed-down
predicate is guarded also by a flag that prevents checking the predicate for the NULL
-complemented row generated by the corresponding outer join operation.
Note that access by key from one inner table to another in the same nested join is prohibited if it is induced
by a predicate from the WHERE
condition. (We could use conditional key access in
this case, but this technique is not employed yet in MySQL 5.6.)