Bug 245186 - zfs panic despite failmode=continue
Summary: zfs panic despite failmode=continue
Status: Closed Works As Intended
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.1-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-fs (Nobody)
URL:
Keywords: crash, needs-qa
Depends on:
Blocks:
 
Reported: 2020-03-30 13:45 UTC by John F. Carr
Modified: 2020-04-07 14:41 UTC (History)
1 user (show)

See Also:


Attachments
/var/log/messages around crash (56.03 KB, text/plain)
2020-03-30 13:45 UTC, John F. Carr
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description John F. Carr 2020-03-30 13:45:23 UTC
Created attachment 212861 [details]
/var/log/messages around crash

I have a raidz2 ZFS pool with 5 spinning disks encrypted with geli.  One of the disks is failing.  The system crashed twice in the past day.  One crash left no record.  The other left a message:

reboot after panic: I/O to pool 'private' appears to be hung on vdev guid 9598894585529158357 at '/dev/da6.eli'.

I have failmode=continue on the pool.  I expect data to be discarded if the drive returns a write error.  The system should not panic.

Kernel version is r359019.  See attached log for other system details.
Comment 1 Andriy Gapon freebsd_committer freebsd_triage 2020-03-30 20:52:16 UTC
A hung I/O request is not an I/O error, it's a more tricky condition.  I believe that ZFS behaves correctly.
You can disable the panic on hang by disabling vfs.zfs.deadman_enabled.
Comment 2 John F. Carr 2020-03-31 21:16:34 UTC
I understand it's a different path internally, but I asked for disk errors not to crash the system and that's what I expect to happen.

The code in spa_misc.c appears to allow 1,000 seconds.  I've seen sync take a significant fraction of that time with working disks.  I/O on a failing disk can be orders of magnitude slower than usual.  It might take seems like forever to work through the queue, but the driver is continuing to process I/O requests.

Unfortunately based on comments the deadman timer is based on the oldest pending I/O.  If the kernel used a per-disk timer that counted time with a non-empty queue and no requests completing it would be able to distinguish a very slow disk from a hung driver.  Or it could maintain some counter of failed I/O and mark the disk dead when the rate got too high.

I think the drive should be kicked out of the pool and its I/O queue flushed in this situation.  When my drive first started failing that's what happened.  I'd run zpool status and find one of the drives removed.  I could run geli attach and a zpool command to bring it back in until the next time it got kicked out.  More recently the system started crashing instead.
Comment 3 Andriy Gapon freebsd_committer freebsd_triage 2020-04-01 14:56:12 UTC
Note that 1000 seconds is a really long time even for a failing disk.
Also, I will just repeat that a hung I/O (current state of which is unknown) is a really special case very different from any error. A request failed because of a timeout (by hardware or by a driver) is still different from a forever in-progress request.