Bug 148368 - [zfs] ZFS hanging forever on 8.1-PRERELEASE
Summary: [zfs] ZFS hanging forever on 8.1-PRERELEASE
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-07-05 00:10 UTC by Rich Ercolani
Modified: 2017-08-27 04:33 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Rich Ercolani 2010-07-05 00:10:04 UTC
Occasionally, much to our chagrin, drives malfunction.

When this happens, ZFS and company appear to "handle" the errors correctly, but in practice, they often require a reboot to become at all responsive any more [e.g. "zpool scrub [affected pool]" will hang forever without returning to a shell, eventually "zpool status" will hang forever].

I've seen this problem before, but we were running an old kernel [circa November 2009] from RELENG_8, and presumed it would go away on upgrade.

The kernel config is the GENERIC config with the following modifications:
# diff GENERIC DTRACE
19c19
< # $FreeBSD: src/sys/amd64/conf/GENERIC,v 1.531.2.13 2010/05/02 06:24:17 imp Exp $
---
> # $FreeBSD: src/sys/amd64/conf/GENERIC,v 1.531.2.8 2010/01/18 00:53:21 imp Exp $
22c22
< ident         GENERIC
---
> ident         DTRACE
57c57
< options       COMPAT_FREEBSD32        # Compatible with i386 binaries
---
> options       COMPAT_IA32             # Compatible with i386 binaries
76,77c76,78
< #options      KDTRACE_FRAME           # Ensure frames are compiled in
< #options      KDTRACE_HOOKS           # Kernel DTrace hooks
---
> options       KDTRACE_FRAME           # Ensure frames are compiled in
> options       KDTRACE_HOOKS           # Kernel DTrace hooks
> options       DDB_CTF                 # Still more Dtrace-related hooks
227d227
< device                sge             # Silicon Integrated Systems SiS190/191
284d283
< options       USB_DEBUG       # enable debug msgs

I'm sorry I can't include a precise revision number of the kernel, I used cvsup to pull it, and I don't know how to extract the revision number.

I'm going to try pulling and installing latest RELENG_8 and see if that helps.

For reference, the errors printed in kernel log when the zpool reported read/write errors on a disk:
Jul  4 05:03:29 manticore kernel: arcmsr0:block 'read/write' commandwith gone raid volume Cmd= a, TargetId=1, Lun=4
Jul  4 05:03:29 manticore kernel: arcmsr0:block 'read/write' commandwith gone raid volume Cmd= a, TargetId=1, Lun=4
Jul  4 05:03:29 manticore kernel: arcmsr0:block 'read/write' commandwith gone raid volume Cmd= 8, TargetId=1, Lun=4

Status of the pool now:
  pool: cannoli
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress for 0h13m, 0.87% done, 25h56m to go
config:

        NAME        STATE     READ WRITE CKSUM
        cannoli     ONLINE       0     0     0
          da5       ONLINE       0     0     0
          da6       ONLINE       0     0     0
          da2       ONLINE       0     0     0
          da4       ONLINE       0     0     4

errors: 1 data errors, use '-v' for a list


At this point, the system will fail to reboot cleanly, as it spends forever waiting for the zfs filesystems to cleanly unmount [presumably.]

My next kernel will have DDB built in.

How-To-Repeat: 1) Have a disk which occasionally reports uncorrected read/write errors with a ZFS filesystem on it.
2) ZFS will eventually completely cease to respond to all queries using the "zpool" or "zfs" commands. [traffic to the mounted filesystems is fine for much longer, until the point where the entire system becomes unresponsive.]
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2010-07-05 03:09:06 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Over to maintainer(s).