15.5. Using ZFS Replication

15.5.1. Using ZFS for File System Replication
15.5.2. Configuring MySQL for ZFS Replication
15.5.3. Handling MySQL Recovery with ZFS

To support high availability environments, providing an instant copy of the information on both the currently active machine and the hot backup is a critical part of the HA solution. There are many solutions to this problem, including Chapter 16, Replication and Section 15.2, "Overview of MySQL with DRBD/Pacemaker/Corosync/Oracle Linux".

The ZFS file system provides functionality to create a snapshot of the file system contents, transfer the snapshot to another machine, and extract the snapshot to recreate the file system. You can create a snapshot at any time, and you can create as many snapshots as you like. By continually creating, transferring, and restoring snapshots, you can provide synchronization between one or more machines in a fashion similar to DRBD.

The following example shows a simple Solaris system running with a single ZFS pool, mounted at /scratchpool:

Filesystem             size   used  avail capacity  Mounted on/dev/dsk/c0d0s0        4.6G   3.7G   886M    82%    //devices                 0K     0K     0K     0%    /devicesctfs                     0K     0K     0K     0%    /system/contractproc                     0K     0K     0K     0%    /procmnttab                   0K     0K     0K     0%    /etc/mnttabswap                   1.4G   892K   1.4G     1%    /etc/svc/volatileobjfs                    0K     0K     0K     0%    /system/object/usr/lib/libc/libc_hwcap1.so.1                       4.6G   3.7G   886M    82%    /lib/libc.so.1fd                       0K     0K     0K     0%    /dev/fdswap                   1.4G    40K   1.4G     1%    /tmpswap                   1.4G    28K   1.4G     1%    /var/run/dev/dsk/c0d0s7         26G   913M    25G     4%    /export/homescratchpool             16G    24K    16G     1%    /scratchpool

The MySQL data is stored in a directory on /scratchpool. To help demonstrate some of the basic replication functionality, there are also other items stored in /scratchpool as well:

total 17drwxr-xr-x  31 root     bin           50 Jul 21 07:32 DTT/drwxr-xr-x   4 root     bin            5 Jul 21 07:32 SUNWmlib/drwxr-xr-x  14 root     sys           16 Nov  5 09:56 SUNWspro/drwxrwxrwx  19 1000     1000          40 Nov  6 19:16 emacs-22.1/

To create a snapshot of the file system, you use zfs snapshot, specifying the pool and the snapshot name:

root-shell> zfs snapshot scratchpool@snap1

To list the snapshots already taken:

root-shell> zfs list -t snapshotNAME                USED  AVAIL  REFER  MOUNTPOINTscratchpool@snap1      0      -  24.5K  -scratchpool@snap2      0      -  24.5K  -

The snapshots themselves are stored within the file system metadata, and the space required to keep them varies as time goes on because of the way the snapshots are created. The initial creation of a snapshot is very quick, because instead of taking an entire copy of the data and metadata required to hold the entire snapshot, ZFS records only the point in time and metadata of when the snapshot was created.

As more changes to the original file system are made, the size of the snapshot increases because more space is required to keep the record of the old blocks. If you create lots of snapshots, say one per day, and then delete the snapshots from earlier in the week, the size of the newer snapshots might also increase, as the changes that make up the newer state have to be included in the more recent snapshots, rather than being spread over the seven snapshots that make up the week.

You cannot directly back up the snapshots because they exist within the file system metadata rather than as regular files. To get the snapshot into a format that you can copy to another file system, tape, and so on, you use the zfs send command to create a stream version of the snapshot.

For example, to write the snapshot out to a file:

root-shell> zfs send scratchpool@snap1 >/backup/scratchpool-snap1

Or tape:

root-shell> zfs send scratchpool@snap1 >/dev/rmt/0

You can also write out the incremental changes between two snapshots using zfs send:

root-shell> zfs send scratchpool@snap1 scratchpool@snap2 >/backup/scratchpool-changes

To recover a snapshot, you use zfs recv, which applies the snapshot information either to a new file system, or to an existing one.