Although ZFS offers the possibility to define devices as "spares" for MIRROR / RAIDZ / RAIDZ2 storage pools, and FreeBSD will happily accept this, such "spare" devices will *NOT* automagically take over if a RAID pool device fails. According to http://docs.sun.com/app/docs/doc/819-5461/gcvcw?a=view , I understand that the device replacement with a spare might not be performed by the kernel ZFS module but by an external agent/daemon ? « Automatic replacement When a fault is received, an FMA agent examines the pool to see if it has any available hot spares. If so, it replaces the faulted device with an available spare. » I'm unable to find such a tool in FreeBSD, at least if it exists (?) it isn't active by default. So in the current status ZFS "spares" have to be activated / deactivated manually when a disk fails or is replaced. Not only this is suboptimal but this presents a data loss risk for people who would assume that "spares" would just do what they are intended for in all usual RAID implementations... Where they won't and will just sit there idle if a disk dies, until the admin manually activates them. This deserves preferably a fix, but at least a prominent WARNING note... Also, although SUN doc states « Multiple pools can share devices that are designated as hot spares », in the current FreeBSD implementation ZFS will refuse to assign to a pool a "spare" which is already assigned to another, stating the device is "busy", i.e.: # zpool status pool: syspool state: ONLINE (Blah-blah) NAME STATE READ WRITE CKSUM syspool ONLINE 0 0 0 mirror ONLINE 0 0 0 aacd1 ONLINE 0 0 0 aacd2 ONLINE 0 0 0 spares da15 AVAIL (Blah-blah) # zpool add vol01 spare da15 invalid vdev specification use '-f' to override the following errors: da15 is in use (r1w1e1) # zpool add -f vol01 spare da15 invalid vdev specification the following errors must be manually repaired: da15 is in use (r1w1e1) How-To-Repeat: Create any redundant ZFS storage pool with a spare device. Hot-remove (or manually "offline") an active device from the pool. The spare won't take over unless a manual "zpool replace <pool_name> <failed_device> <spare_device>" is issued.
Responsible Changed From-To: freebsd-bugs->freebsd-fs Over to maintainer(s).
may be some zfs developers comment this?
A partial/potential solution is described here:=20 http://lists.freebsd.org/pipermail/freebsd-stable/2010-March/055686.html=20 =20
delphij and I have verified that this issue is resolved on the zfsd svn branch, but this hasn't been backported to CURRENT and contains a number of changes to geom, and a handful of changes to zfs. I'll leave it to the reader to determine where between geom and zfs things are getting hung up. Thanks, -Garrett
Is this still that hard issue, which cannot be solved for 3 years, even if there is possible zfsd solution in another branch? FreeBSD is used as ZFS storage more often, and many users are in false feel they have hot spares configured and fully working.
What is the purpose of ZFS spares on FreeBSD if they doesn't work at all? I cannot see any sense to use spare - I can just run "replace" subcommand, right?
zfsd(8) is available in FreeBSD 11.0. Just add "zfsd_enable=YES" to /etc/rc.conf and spares will automatically take over when a disk fails.