Summary: | ZFS on USB drive prevents shutdown / reboot | ||
---|---|---|---|
Product: | Base System | Reporter: | Jeff Kletsky <freebsd> |
Component: | kern | Assignee: | freebsd-fs (Nobody) <fs> |
Status: | Open --- | ||
Severity: | Affects Some People | CC: | g_amanakis, jeff+freebsd, jfc, pi, raven428, usb, vadim.khondar+freebsd-bugs |
Priority: | --- | ||
Version: | 13.2-STABLE | ||
Hardware: | Any | ||
OS: | Any |
Description
Jeff Kletsky
2012-05-07 15:30:11 UTC
Responsible Changed From-To: freebsd-bugs->freebsd-fs Over to maintainer(s). Problem can be replicated by booting of a "memstick" (with a "spare" USB stick as /dev/da1) and then executing # dd if=/dev/zer of=/dev/da1 bs=64k # zpool create stick /dev/da1 # reboot Problem has been reliably reproduced on the Atom 330 previously mentioned, as well as on an AMD A8-3870 with A75 chipset. It also can be replicated using VirtualBox running under Ubuntu on the AMD A8-3870 system. It does not seem specific to one "flavor" of USB controller or driver. Using /usr/src/release/generate_release.sh and bisection, I have confirmed that * r227445 does not exhibit the behavior ("Copy stable/9 to releng/9.0 as part of the FreeBSD 9.0-RELEASE release cycle) * r229097 does not exhibit the behavior * r229281 -- FAIL by not rebooting under the conditions described above. Based on these results, I am suspicious of r229100 | hselasky | 2011-12-31 06:33:15 -0800 (Sat, 31 Dec 2011) | 6 lines MFC r228709, r228711 and r228723: - Add missing unlock of USB controller's lock, when doing shutdown, suspend and resume. - Add code to wait for USB shutdown to be executed at system shutdown. - Add sysctl which can be used to skip this waiting. as being what brought the issue to the forefront. I am presently building r229099 and r229100 to confirm this suspicion. A potential, though untested workaround would be # sysctl hw.usb.no_shutdown_wait=1 Not surprisingly: r229099 does *not* exhibit the symptom r229100 *does* exhibit the symptom # sysctl hw.usb.no_shutdown_wait=1 is confirmed as a workaround Just a me too: FreeBSD fred 9.0-STABLE FreeBSD 9.0-STABLE #0 r237006: Wed Jun 13 20:38:08 BST 2012 root@fred:/usr/obj/usr/src/sys/GENERIC amd64 Setting hw.usb.no_shutdown_wait=1 shuts down properly. Without simply hangs. FreeBSD 10.0-STABLE #0 r266463: Tue May 20 18:24:03 UTC 2014 and the issue is still here, but to repeat it need to be created zvol on some zfs pool, http://farm6.staticflickr.com/5506/14550830655_d517ab28f5_b.jpg - screenshot. the machine is shutting down fine, if no zvols present in system. batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed. A slight detour from the focus on ZFS … hw.usb.no_shutdown_wait=1 – generally, is there _ever_ an elevated risk of loss of data when there's no wait? I just hit this issue with 13.1-STABLE. I have zpools that include USB disks. On reboot, system hangs with messages like "Solaris: WARNING: Pool 'pool' has encountered an uncorrectable I/O failure and has been suspended." This happens after 'All buffers synced' when there is a series of messages about detaching usb devices, e.g.: "ukbd0: detached", "uhid0: detached", "umass0: detached". I also noticed that single user mode with all zfs partitions mounted (doing /etc/rc.d/zfs start manually after logging in) this does not happen - no usb detached messages and system reboots normally. *** Bug 211491 has been marked as a duplicate of this bug. *** (In reply to Jeff Kletsky from comment #0) Is this reproducible with any supported RELEASE, or branch, of FreeBSD? (In reply to Vadym Khondar from comment #8) > … 13.1-STABLE. … include USB disks … uncorrectable I/O failure … Reproducible with more recent STABLE? I have never encountered the issue with FreeBSD 14.0-CURRENT. Can you provide additional information about your hardware? Output from commands such as these: gpart show geom disk list zpool list -v Thanks In response to the direct query, I have not seen this issue in several years. However, as hardware configurations here have changed significantly, I can not confirm that the problem does not exist with current releases. (In reply to Jeff Kletsky from comment #11) Thanks for the feedback, and for your patience. All things considered: let's close this report. If anyone finds recurrence of symptoms, we might reopen and reassign to fs@ Please reopen. I am seeing this bug today on 13.2-STABLE. I have a ZFS pool using five disks in a Sabrent enclosure. In terms of drivers, uhub5 on uhub0 uhub5: <VIA Labs, Inc. USB3.1 Hub, class 9/0, rev 3.20/90.13, addr 1> on usbus0 uhub5: 4 ports with 4 removable, self powered usb_msc_auto_quirk: UQ_MSC_NO_GETMAXLUN set for USB mass storage device ASMT ASM235CM (0x174c:0x55aa) ugen0.3: <ASMT ASM235CM> at usbus0 umass0 on uhub5 umass0: <ASMT ASM235CM, class 0/0, rev 3.10/1.00, addr 2> on usbus0 umass0: SCSI over Bulk-Only; quirks = 0x0100 umass0:20:0: Attached to scbus20 da0 at umass-sim0 bus 0 scbus20 target 0 lun 0 da0: <ASMT ASM235CM 0> Fixed Direct Access SPC-4 SCSI device da0: Serial Number 915000000C05 da0: 400.000MB/s transfers da0: 5723166MB (11721045168 512 byte sectors) da0: quirks=0x2<NO_6_BYTE> Other devices appear as uhub6 on uhub5 umass1 on uhub6 [= da1] umass2 on uhub6 [= da2] umass3 on uhub6 [= da3] umass4 on uhub6 [= da4] I infer the enclosure has a hub connected to a disk and a second hub. The second hub has the other 4 disks. When I reboot the system I see (retyped from photo) All buffers synced. Uptime: 3d3h57m14s uhub0: detached ukbd0: detached ... umass0: detached umass1: detached Solaris: WARNING: Pool 'sabrent' has encountered an uncorrectable I/O failure and has been suspended. Then the system hangs. Within the not especially long limit of my patience there is no more output. No detach message for uhub5, uhub6, umass2, umass3, and umass4. I have failmode=continue on this pool precisely to avoid having the system hang. There might be two bugs, shutdown order and failure to respect failmode=continue. This is repeatable. Seen again on 13.2-STABLE If I run "zpool export sabrent" on the pool described in my previous comment the command takes about 7 seconds to complete. ^T reports load: 7.17 cmd: zpool 66858 [tx->tx_sync_done_cv] 4.07r 0.00u 2.54s 20% 8008k mi_switch+0x152 sleepq_switch+0x104 _cv_wait+0x14a txg_wait_synced_impl+0xeb txg_wait_synced+0xb zil_close+0x128 zfsvfs_teardown+0xb1 zfs_umount+0x129 dounmount+0x396 kern_unmount+0x312 amd64_syscall+0x13c fast_syscall_common+0xf8 load: 6.76 cmd: zpool 66858 [running] 7.24r 0.00u 3.14s 20% 8024k uma_zfree_arg+0xd6 abd_free_chunks+0x44 abd_free+0x45 arc_hdr_free_abd+0x144 arc_evict_state+0xaca arc_flush+0x8c dsl_pool_close+0xe3 spa_unload+0x349 spa_export_common+0x2c8 zfsdev_ioctl_common+0x5ce zfsdev_ioctl+0x12a devfs_ioctl+0xd2 vn_ioctl+0x136 devfs_ioctl_f+0x1e kern_ioctl+0x286 sys_ioctl+0x140 amd64_syscall+0x13c fast_syscall_common+0xf8 I see after filesystems are unmounted there is still some ARC cleanup to be done. The shutdown path has a call to vfs_unmountall in bufshutdown. Does it have a call out to ZFS to quiesce pools? The hang is reproducible so I am able to test a fix if anybody has one. |