one of the reasons i like zfs…

I got home from work tonight and zfs had reported that one of the disks in my main pool was dying.

no great problem. insert a disk i had spare, and run zpool replace like so:

ganesh:~# zpool replace -f export  scsi-SATA_WDC_WD10EARS-00_WD-WMAV50933036 scsi-SATA_ST31000528AS_9VP16X03

and now it’s just a matter of waiting until the data has “resilvered” and i can remove the old dying drive.

ganesh:~# zpool status export -v
  pool: export
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Mon Nov 28 19:25:49 2011
    75.1G scanned out of 2.29T at 88.3M/s, 7h17m to go
    18.0G resilvered, 3.21% done
config:

        NAME                                                   STATE     READ WRITE CKSUM
        export                                                 ONLINE       0     0     0
          raidz1-0                                             ONLINE       0     0     0
            scsi-SATA_WDC_WD10EACS-00_WD-WCASJ2114122          ONLINE       0     0     0
            scsi-SATA_WDC_WD10EACS-00_WD-WCASJ2195141          ONLINE       0     0     0
            scsi-SATA_WDC_WD10EARS-00_WD-WMAV50817803          ONLINE       0     0     0
            replacing-3                                        ONLINE       0     0     0
              scsi-SATA_WDC_WD10EARS-00_WD-WMAV50933036        ONLINE       0     0     0
              scsi-SATA_ST31000528AS_9VP16X03                  ONLINE       0     0     0  (resilvering)
        logs
          scsi-SATA_Patriot_Torqx_278BF0715010800025492-part6  ONLINE       0     0     0
        cache
          scsi-SATA_Patriot_Torqx_278BF0715010800025492-part7  ONLINE       0     0     0

errors: No known data errors

(I use /dev/disk/by-id device names so I don’t have to care about the kernel detecting the disks and disk controllers in a different order on each reboot…also makes it easier to identify which disk is having a problem as I can squint at the front of the drives in my Lian Li 4-in-3 hotswap bays and read the serial number sticker)

OK, yeah, sure…it’s not that much different from replacing a dying drive in an mdadm or hardware raid array but one nice advantage of zfs over raid is that it only resyncs the data in use, not the empty or unused blocks. given that these are 1TB drives in a raidz vdev, and the pool is 63% used that means syncing only about 630GB rather than 1TB.

ps: in the time it’s taken to write this, it’s up to 10.11% done. 88MB/s reading data from the existing raidz drives and writing to a single drive isn’t too bad at all.

3 Comments

    1. cas

      I’m using LLNL’s ZFS for Linux

      I’m compiling local packages for debian from the daily releases at the Ubuntu zfsonlinux ppa.

      There’s only one minor change to the debian/control file required to get them to build for debian instead of ubuntu: Edit the Depends line for zfs-initramfs so that it depends on grub rather than zfs-grub

      The zfs-grub package doesn’t exist in debian…debian’s grub package contains the zfs.mod and zfsinfo.mod modules anyway. I’m not booting from a ZFS pool (still have an ext4 /boot and and XFS /), so i don’t know if it’s functionally the same as Ubuntu’s zfs-grub package.

  1. cas

    Update: The resilver operation finished successfully last night just before 3am

    ganesh:~# zpool status export -v
      pool: export
     state: ONLINE
     scan: resilvered 557G in 7h30m with 0 errors on Tue Nov 29 02:56:43 2011
    config:
    
            NAME                                                   STATE     READ WRITE CKSUM
            export                                                 ONLINE       0     0     0
              raidz1-0                                             ONLINE       0     0     0
                scsi-SATA_WDC_WD10EACS-00_WD-WCASJ2114122          ONLINE       0     0     0
                scsi-SATA_WDC_WD10EACS-00_WD-WCASJ2195141          ONLINE       0     0     0
                scsi-SATA_WDC_WD10EARS-00_WD-WMAV50817803          ONLINE       0     0     0
                scsi-SATA_ST31000528AS_9VP16X03                    ONLINE       0     0     0
            logs
              scsi-SATA_Patriot_Torqx_278BF0715010800025492-part6  ONLINE       0     0     0
            cache
              scsi-SATA_Patriot_Torqx_278BF0715010800025492-part7  ONLINE       0     0     0
    
    errors: No known data errors

Comments are closed.