This section lists a number of miscellaneous tips for improving query processing speed:
Use persistent connections to the database to avoid connection overhead. If you
cannot use persistent connections and you are initiating many new connections to the database, you may
want to change the value of the
thread_cache_size variable. See Section
8.11.2, "Tuning Server Parameters".
Always check whether all your queries really use the indexes that you have created
in the tables. In MySQL, you can do this with the
EXPLAIN statement. See Section
8.8.1, "Optimizing Queries with
Try to avoid complex
SELECT queries on
MyISAM tables that are
updated frequently, to avoid problems with table locking that occur due to contention between readers
MyISAM supports concurrent inserts: If a table has no
free blocks in the middle of the data file, you can
INSERT new rows into it at the same time that other threads are
reading from the table. If it is important to be able to do this, consider using the table in ways that
avoid deleting rows. Another possibility is to run
OPTIMIZE TABLE to defragment the table after you have deleted a lot
of rows from it. This behavior is altered by setting the
concurrent_insert variable. You can force new rows to be appended
(and therefore permit concurrent inserts), even in tables that have deleted rows. See Section
8.10.3, "Concurrent Inserts".
ALTER TABLE ... ORDER BY if you usually retrieve rows in
order. By using this option after extensive
changes to the table, you may be able to get higher performance.
In some cases, it may make sense to introduce a column that is "hashed" based on information from other columns. If this column is short, reasonably unique, and indexed, it may be much faster than a "wide" index on many columns. In MySQL, it is very easy to use this extra column:
SELECT * FROM
MyISAM tables that change frequently, try to avoid
all variable-length columns (
TEXT). The table uses dynamic row format if it includes even a single
variable-length column. See Chapter 14, Storage
It is normally not useful to split a table into different tables just because the
rows become large. In accessing a row, the biggest performance hit is the disk seek needed to find the
first byte of the row. After finding the data, most modern disks can read the entire row fast enough for
most applications. The only cases where splitting up a table makes an appreciable difference is if it is
MyISAM table using dynamic row format that you can change to a fixed row
size, or if you very often need to scan the table but do not need most of the columns. See Chapter 14, Storage
If you often need to calculate results such as counts based on information from a lot of rows, it may be preferable to introduce a new table and update the counter in real time. An update of the following form is very fast:
This is very important when you use MySQL storage engines such as
MyISAM that has only table-level locking (multiple readers with single
writers). This also gives better performance with most database systems, because the row locking
manager in this case has less to do.
If you need to collect statistics from large log tables, use summary tables instead of scanning the entire log table. Maintaining the summaries should be much faster than trying to calculate statistics "live." Regenerating new summary tables from the logs when things change (depending on business decisions) is faster than changing the running application.
If possible, classify reports as "live" or as "statistical," where data needed for statistical reports is created only from summary tables that are generated periodically from the live data.
Take advantage of the fact that columns have default values. Insert values explicitly only when the value to be inserted differs from the default. This reduces the parsing that MySQL must do and improves the insert speed.
In some cases, it is convenient to pack and store data into a
BLOB column. In this case, you must provide code in your application to
pack and unpack information, but this may save a lot of accesses at some stage. This is practical when
you have data that does not conform well to a rows-and-columns table structure.
Normally, try to keep all data nonredundant (observing what is referred to in database theory as third normal form). However, there may be situations in which it can be advantageous to duplicate information or create summary tables to gain more speed.
Stored routines or UDFs (user-defined functions) may be a good way to gain performance for some tasks. See Section 18.2, "Using Stored Routines (Procedures and Functions)", and Section 22.3, "Adding New Functions to MySQL", for more information.
You can increase performance by caching queries or answers in your application and then executing many inserts or updates together. If your database system supports table locks (as does MySQL), this should help to ensure that the index cache is only flushed once after all updates. You can also take advantage of MySQL's query cache to achieve similar results; see Section 8.9.3, "The MySQL Query Cache".
INSERT statements to store many rows with one SQL statement. (This is a
relatively portable technique.)
AUTO_INCREMENT columns so that each row in a table
can be identified by a single unique value.
MEMORY tables when possible to get more speed. See
Section 14.4, "The
MEMORY tables are useful for noncritical data that
is accessed often, such as information about the last displayed banner for users who don't have cookies
enabled in their Web browser. User sessions are another alternative available in many Web application
environments for handling volatile state data.
With Web servers, images and other binary assets should normally be stored as files. That is, store only a reference to the file rather than the file itself in the database. Most Web servers are better at caching files than database contents, so using files is generally faster.
Columns with identical information in different tables should be declared to have identical data types so that joins based on the corresponding columns will be faster.
Try to keep column names simple. For example, in a table named
customer, use a column name of
name instead of
customer_name. To make your names portable to other SQL servers, consider
keeping them shorter than 18 characters.
If you need really high speed, look at the low-level interfaces for data storage
that the different SQL servers support. For example, by accessing the MySQL
MyISAM storage engine directly, you could get a speed increase of two to
five times compared to using the SQL interface. To be able to do this, the data must be on the same
server as the application, and usually it should only be accessed by one process (because external file
locking is really slow). One could eliminate these problems by introducing low-level
MyISAM commands in the MySQL server (this could be one easy way to get
more performance if needed). By carefully designing the database interface, it should be quite easy to
support this type of optimization.
If you are using numeric data, it is faster in many cases to access information from a database (using a live connection) than to access a text file. Information in the database is likely to be stored in a more compact format than in the text file, so accessing it involves fewer disk accesses. You also save code in your application because you need not parse your text files to find line and column boundaries.
Replication can provide a performance benefit for some operations. You can distribute client retrievals among replication servers to split up the load. To avoid slowing down the master while making backups, you can make backups using a slave server. See Chapter 16, Replication.
MyISAM table with the
DELAY_KEY_WRITE=1 table option makes index updates faster because they are
not flushed to disk until the table is closed. The downside is that if something kills the server while
such a table is open, you must ensure that the table is okay by running the server with the
--myisam-recover-options option, or by running myisamchk before restarting the server. (However,
even in this case, you should not lose anything by using
because the key information can always be generated from the data rows.)
HIGH_PRIORITY have an
effect only for nontransactional storage engines that use only table-level locking.