Disk seeks are a huge performance bottleneck. This problem becomes more apparent when the amount of data starts to grow so large that effective caching becomes impossible. For large databases where you access data more or less randomly, you can be sure that you need at least one disk seek to read and a couple of disk seeks to write things. To minimize this problem, use disks with low seek times.
Increase the number of available disk spindles (and thereby reduce the seek overhead) by either symlinking files to different disks or striping the disks:
Using symbolic links
This means that, for
MyISAM tables, you symlink the index
file and data files from their usual location in the data directory to another disk
(that may also be striped). This makes both the seek and read times better, assuming
that the disk is not used for other purposes as well. See Section
126.96.36.199, "Using Symbolic Links".
Striping means that you have many disks and put the first block on the first disk, the
second block on the second disk, and the
block on the (
) disk, and so on.
This means if your normal data size is less than the stripe size (or perfectly aligned),
you get much better performance. Striping is very dependent on the operating system and
the stripe size, so benchmark your application with different stripe sizes. See Section 8.12.3, "Using Your Own
The speed difference for striping is very dependent on the parameters. Depending on how you set the striping parameters and number of disks, you may get differences measured in orders of magnitude. You have to choose to optimize for random or sequential access.
For reliability, you may want to use RAID 0+1 (striping plus mirroring), but in
this case, you need 2 ×
N drives to hold
N drives of data. This is probably the best option if you
have the money for it. However, you may also have to invest in some volume-management software to handle
A good option is to vary the RAID level according to how critical a type of data
is. For example, store semi-important data that can be regenerated on a RAID 0 disk, but store really
important data such as host information and logs on a RAID 0+1 or RAID
N disk. RAID
can be a problem if you have many writes, due to the time required to update the parity bits.
On Linux, you can get much better performance by using
hdparm to configure your disk's interface. (Up to 100% under load is not
uncommon.) The following
hdparm options should be quite good for MySQL, and
probably for many other applications:
hdparm -m 16 -d 1
Note that performance and reliability when using this command depend on your hardware, so we
strongly suggest that you test your system thoroughly after using
hdparm. Please consult the
page for more information. If
hdparm is not used wisely, file system
corruption may result, so back up everything before experimenting!
You can also set the parameters for the file system that the database uses:
If you do not need to know when files were last accessed (which is not really useful on a database
server), you can mount your file systems with the
-o noatime option.
That skips updates to the last access time in inodes on the file system, which avoids some disk
On many operating systems, you can set a file system to be updated asynchronously by mounting it
-o async option. If your computer is reasonably stable, this
should give you better performance without sacrificing too much reliability. (This flag is on by
default on Linux.)