Summary: | [ZFS] Crash upon dectection of failed drive. | ||
---|---|---|---|
Product: | Base System | Reporter: | Sean Bruno <sbruno> |
Component: | bin | Assignee: | freebsd-fs (Nobody) <fs> |
Status: | Closed FIXED | ||
Severity: | Affects Some People | CC: | allanjude, andyf, re |
Priority: | --- | ||
Version: | CURRENT | ||
Hardware: | amd64 | ||
OS: | Any | ||
Bug Depends on: | |||
Bug Blocks: | 228911 |
Description
Sean Bruno
2018-09-20 14:24:55 UTC
ftp-mirror0.isc.freebsd.org 12.0-ALPHA6 FreeBSD 12.0-ALPHA6 #0 r338761M: Tue Sep 18 22:55:31 UTC 2018 I've put the kernel/crashdump/debuginfo in freefall.freebsd.org:~sbruno/ftpmirror_crash.tgz da3 is the disk that is failing. Starting up the jails and indexing files on the zftp pool will cause the panic in this ticket. I *can* remove the drive, but I want to wait for some diagnostics from ZFS folks first. root@ftp-mirror0.isc:~ # zpool status pool: zftp state: ONLINE scan: scrub repaired 150K in 0 days 02:03:31 with 0 errors on Fri Aug 17 05:19:19 2018 config: NAME STATE READ WRITE CKSUM zftp ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 da0p1 ONLINE 0 0 0 da1p1 ONLINE 0 0 0 da2p1 ONLINE 0 0 0 da3p1 ONLINE 0 0 0 da4p1 ONLINE 0 0 0 da5p1 ONLINE 0 0 0 da6p1 ONLINE 0 0 0 da7p1 ONLINE 0 0 0 errors: No known data errors pool: zroot state: ONLINE scan: scrub repaired 0 in 0 days 03:16:08 with 0 errors on Fri Aug 17 06:31:58 2018 config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada0p3 ONLINE 0 0 0 ada1p3 ONLINE 0 0 0 errors: No known data errors Recently saw a ZFS crash due to 2 dying/dead disks (of 4) in a RAIDZ1: FreeBSD 11.1-STABLE #0 r331113: Sun Mar 18 19:57:45 AEST 2018 panic: solaris assert: remove_reference(hdr, ((void *)0), tag) == 0 (0xd == 0x0), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line: 3463 cpuid = 7 KDB: stack backtrace: #0 0xffffffff805cf167 at kdb_backtrace+0x67 #1 0xffffffff8058cc46 at vpanic+0x186 #2 0xffffffff8058cab3 at panic+0x43 #3 0xffffffff815cb22c at assfail3+0x2c #4 0xffffffff8124cb50 at arc_buf_destroy+0x1a0 #5 0xffffffff8125c56a at dbuf_read_done+0x4a #6 0xffffffff8124f236 at arc_read_done+0x296 #7 0xffffffff812e357e at zio_done+0x80e #8 0xffffffff812df05c at zio_execute+0xac #9 0xffffffff805e0714 at taskqueue_run_locked+0x154 #10 0xffffffff805e1858 at taskqueue_thread_loop+0x98 #11 0xffffffff805596a2 at fork_exit+0x82 #12 0xffffffff8074a13e at fork_trampoline+0xe Uptime: 53s The culprits: (da2:mps0:0:5:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) (da2:mps0:0:5:0): Error 5, Unretryable error (da2:mps0:0:5:0): READ(16). CDB: 88 00 00 00 00 01 cf 2f 5c dc 00 00 00 55 00 00 (da2:mps0:0:5:0): CAM status: SCSI Status Error (da2:mps0:0:5:0): SCSI status: Check Condition (da2:mps0:0:5:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) (da2:mps0:0:5:0): Error 6, Retries exhausted (da2:mps0:0:5:0): Invalidating pack ... Sep 21 11:50:16 <daemon.crit> drunkfish smartd[1060]: Device: /dev/da3 [SAT], 39538 Currently unreadable (pending) sectors Yet another resilvering crash due to disk errors. I've dropped the crashinfo, kernel and debug in: freefall.freebsd.org:~sbruno/ftp-mirror_zfs.tar.gz (In reply to Andy Farkas from comment #3) This crash is different, and should get its own bugzilla number. (In reply to Sean Bruno from comment #0) I have submitted the fix for this to Release Engineering A commit references this bug: Author: allanjude Date: Sat Sep 29 01:26:07 UTC 2018 New revision: 339009 URL: https://svnweb.freebsd.org/changeset/base/339009 Log: Avoid panic when adjusting priority of a read in the face of an IO error PR: 231516 Reported by: sbruno Approved by: re (rgrimes) Obtained from: ZFS-on-Linux X-MFC-with: 334844 Sponsored by: Klara Systems MFV/ZoL: Fix zio->io_priority failed (7 < 6) assert commit c26cf0966d131b722c32f8ccecfe5791a789d975 Author: Tony Hutter <hutter2@llnl.gov> Date: Tue May 29 18:13:48 2018 -0700 Fix zio->io_priority failed (7 < 6) assert This fixes an assert in vdev_queue_change_io_priority(): VERIFY3(zio->io_priority < ZIO_PRIORITY_NUM_QUEUEABLE) failed (7 < 6) PANIC at vdev_queue.c:832:vdev_queue_change_io_priority() Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Changes: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c A commit references this bug: Author: mav Date: Thu Apr 25 15:51:01 UTC 2019 New revision: 346679 URL: https://svnweb.freebsd.org/changeset/base/346679 Log: MFC r339009 (by allanjude): Avoid panic when adjusting priority of a read in the face of an IO error PR: 231516 Reported by: sbruno Approved by: re (rgrimes) Obtained from: ZFS-on-Linux X-MFC-with: 334844 Sponsored by: Klara Systems MFV/ZoL: Fix zio->io_priority failed (7 < 6) assert commit c26cf0966d131b722c32f8ccecfe5791a789d975 Author: Tony Hutter <hutter2@llnl.gov> Date: Tue May 29 18:13:48 2018 -0700 Fix zio->io_priority failed (7 < 6) assert This fixes an assert in vdev_queue_change_io_priority(): VERIFY3(zio->io_priority < ZIO_PRIORITY_NUM_QUEUEABLE) failed (7 < 6) PANIC at vdev_queue.c:832:vdev_queue_change_io_priority() Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Changes: _U stable/11/ stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c |