Spec-Zone .ru
спецификации, руководства, описания, API
|
In some cases, MySQL can use an index to satisfy an ORDER BY
clause without doing
any extra sorting.
The index can also be used even if the ORDER BY
does not match the index exactly,
as long as all of the unused portions of the index and all the extra ORDER BY
columns are constants in the WHERE
clause. The following queries use the index to
resolve the ORDER BY
part:
SELECT * FROM t1 ORDER BYkey_part1
,key_part2
,... ;SELECT * FROM t1 WHEREkey_part1
=constant
ORDER BYkey_part2
;SELECT * FROM t1 ORDER BYkey_part1
DESC,key_part2
DESC;SELECT * FROM t1 WHEREkey_part1
= 1 ORDER BYkey_part1
DESC,key_part2
DESC;SELECT * FROM t1 WHEREkey_part1
>constant
ORDER BYkey_part1
ASC;SELECT * FROM t1 WHEREkey_part1
<constant
ORDER BYkey_part1
DESC;SELECT * FROM t1 WHEREkey_part1
=constant1
ANDkey_part2
>constant2
ORDER BYkey_part2
;
In some cases, MySQL cannot use indexes to resolve the ORDER BY
, although it still uses indexes to find the rows that match the WHERE
clause. These cases include the following:
You use ORDER BY
on different keys:
SELECT * FROM t1 ORDER BYkey1
,key2
;
You use ORDER BY
on nonconsecutive parts of a key:
SELECT * FROM t1 WHEREkey2
=constant
ORDER BYkey_part2
;
You mix ASC
and DESC
:
SELECT * FROM t1 ORDER BYkey_part1
DESC,key_part2
ASC;
The key used to fetch the rows is not the same as the one used in the ORDER BY
:
SELECT * FROM t1 WHEREkey2
=constant
ORDER BYkey1
;
You use ORDER BY
with an expression that includes
terms other than the key column name:
SELECT * FROM t1 ORDER BY ABS(key
);SELECT * FROM t1 ORDER BY -key
;
You are joining many tables, and the columns in the ORDER
BY
are not all from the first nonconstant table that is used to retrieve rows. (This is the first
table in the EXPLAIN
output that does not have a const
join type.)
You have different ORDER BY
and GROUP
BY
expressions.
You index only a prefix of a column named in the ORDER
BY
clause. In this case, the index cannot be used to fully resolve the sort order. For example,
if you have a CHAR(20)
column, but index only the first 10 bytes, the index cannot distinguish values past the 10th byte and a
filesort
will be needed.
The type of table index used does not store rows in order. For example, this is
true for a HASH
index in a MEMORY
table.
Availability of an index for sorting may be affected by the use of column aliases. Suppose that the column t1.a
is indexed. In this statement, the name of the column in the select list is
a
. It refers to t1.a
, so for the reference to a
in the ORDER BY
, the index can be used:
SELECT a FROM t1 ORDER BY a;
In this statement, the name of the column in the select list is also a
, but it is
the alias name. It refers to ABS(a)
, so for the reference to a
in the ORDER BY
, the index cannot be used:
SELECT ABS(a) AS a FROM t1 ORDER BY a;
In the following statement, the ORDER BY
refers to a name that is not the name of a
column in the select list. But there is a column in t1
named a
,
so the ORDER BY
uses that, and the index can be used. (The resulting sort order may
be completely different from the order for ABS(a)
, of course.)
SELECT ABS(a) AS b FROM t1 ORDER BY a;
By default, MySQL sorts all GROUP BY
queries as if you specified col1
, col2
, ...ORDER
BY
in
the query as well. If you include an col1
, col2
, ...ORDER BY
clause explicitly that contains the
same column list, MySQL optimizes it away without any speed penalty, although the sorting still occurs. If a
query includes GROUP BY
but you want to avoid the overhead of sorting the result,
you can suppress sorting by specifying ORDER BY NULL
. For example:
INSERT INTO fooSELECT a, COUNT(*) FROM bar GROUP BY a ORDER BY NULL;
Relying on implicit GROUP BY
sorting in MySQL 5.6 is deprecated. To achieve a
specific sort order of grouped results, it is preferable to use an explicit ORDER
BY
clause. GROUP BY
sorting is a MySQL extension that may change in a
future release; for example, to make it possible for the optimizer to order groupings in whatever manner it
deems most efficient and to avoid the sorting overhead.
With EXPLAIN
SELECT ... ORDER BY
, you can check whether MySQL can use indexes to resolve the query. It cannot if
you see Using filesort
in the Extra
column. See Section
8.8.1, "Optimizing Queries with EXPLAIN
". Filesort uses a fixed-length
row-storage format similar to that used by the MEMORY
storage engine. Variable-length types such as VARCHAR
are stored using a fixed length.
MySQL has two filesort
algorithms for sorting and retrieving results. The original
method uses only the ORDER BY
columns. The modified method uses not just the ORDER BY
columns, but all the columns used in the query.
The optimizer selects which filesort
algorithm to use. It normally uses the
modified algorithm except when BLOB
or TEXT
columns are involved, in which case it uses the original algorithm.
The original filesort
algorithm works as follows:
Read all rows according to key or by table scanning. Rows that do not match the
WHERE
clause are skipped.
For each row, store a pair of values in a buffer (the sort key and the row
pointer). The size of the buffer is the value of the sort_buffer_size
system variable.
When the buffer gets full, run a qsort (quicksort) on it and store the result in a temporary file. Save a pointer to the sorted block. (If all pairs fit into the sort buffer, no temporary file is created.)
Repeat the preceding steps until all rows have been read.
Do a multi-merge of up to MERGEBUFF
(7) regions to one
block in another temporary file. Repeat until all blocks from the first file are in the second file.
Repeat the following until there are fewer than MERGEBUFF2
(15) blocks left.
On the last multi-merge, only the pointer to the row (the last part of the sort key) is written to a result file.
Read the rows in sorted order by using the row pointers in the result file. To
optimize this, we read in a big block of row pointers, sort them, and use them to read the rows in
sorted order into a row buffer. The size of the buffer is the value of the read_rnd_buffer_size
system variable. The code for this step is in
the sql/records.cc
source file.
One problem with this approach is that it reads rows twice: One time when evaluating the WHERE
clause, and again after sorting the pair values. And even if the rows were accessed successively the first time
(for example, if a table scan is done), the second time they are accessed randomly. (The sort keys are ordered,
but the row positions are not.)
The modified filesort
algorithm incorporates an optimization such that it records
not only the sort key value and row position, but also the columns required for the query. This avoids reading
the rows twice. The modified filesort
algorithm works like this:
Read the rows that match the WHERE
clause.
For each row, record a tuple of values consisting of the sort key value and row position, and also the columns required for the query.
Sort the tuples by sort key value
Retrieve the rows in sorted order, but read the required columns directly from the sorted tuples rather than by accessing the table a second time.
Using the modified filesort
algorithm, the tuples are longer than the pairs used in
the original method, and fewer of them fit in the sort buffer (the size of which is given by sort_buffer_size
). As a result, it is possible for the extra I/O to make the
modified approach slower, not faster. To avoid a slowdown, the optimization is used only if the total size of
the extra columns in the sort tuple does not exceed the value of the max_length_for_sort_data
system variable. (A symptom of setting the value of
this variable too high is a combination of high disk activity and low CPU activity.)
For slow queries for which filesort
is not used, try lowering max_length_for_sort_data
to a value that is appropriate to trigger a filesort
.
If you want to increase ORDER BY
speed, check whether you can get MySQL to use
indexes rather than an extra sorting phase. If this is not possible, you can try the following strategies:
Increase the size of the sort_buffer_size
variable.
Increase the size of the read_rnd_buffer_size
variable.
Use less RAM per row by declaring columns only as large as they need to be to hold
the values stored in them. For example, CHAR(16)
is better than CHAR(200)
if values never exceed 16 characters.
Change tmpdir
to point to a dedicated file system with large amounts of free
space. Also, this option accepts several paths that are used in round-robin fashion, so you can use this
feature to spread the load across several directories. Paths should be separated by colon characters
(":
") on Unix and
semicolon characters (";
") on Windows. The paths should be for directories in file
systems that are located on different physical disks, not
different partitions on the same disk.
If an index is not used for ORDER BY
but a LIMIT
clause is also present, the optimizer may be able to avoid using a merge file and sort the rows in memory. For
details, see Section 8.2.1.3, "Optimizing LIMIT
Queries".