Summary: | System hangs after "Uptime" on reboot with ZFS | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | g_amanakis | ||||||
Component: | kern | Assignee: | Graham Perrin <grahamperrin> | ||||||
Status: | Closed DUPLICATE | ||||||||
Severity: | Affects Many People | CC: | delphij, fs, grahamperrin, kungfujesus06, ngie, re, trasz, vangyzen | ||||||
Priority: | --- | Keywords: | needs-qa | ||||||
Version: | 11.0-STABLE | ||||||||
Hardware: | amd64 | ||||||||
OS: | Any | ||||||||
Attachments: |
|
Created attachment 173148 [details]
loader.conf
Of note, despite issuing the "reboot" command, I can still ping the em0 interface on this machine. I have also seen this on a Dell Precision Workstation running 11-CURRENT. It doesn't always happen, but it seems more likely after a buildworld/installworld. I'm using ZFS on a single SSD and an iSCSI LUN using the in-kernel initiator. I'm also using the nvidia driver, since this is my main desktop. I'll try to reproduce and get more details (i.e. a core dump). I think I've narrowed it down to having a ZFS pool imported from an iSCSI LUN using the in-kernel software initiator. g_amanakis: Is this true on your system, too? As an amendment to my last comment, the zpool has to be imported with altroot: zpool import -R /foo bar_pool I was failing to reproduce this on two other systems by having the pool imported without -R (altroot). When I added the altroot, those two systems began reproducing it consistenly. One of those systems is a bhyve VM. On the host, I see that all CPU threads are "vmidle" and consuming no CPU time, so they're blocked on some event, not spinning. (I realize this doesn't help much.) So far, I've been importing the volume using the iSCSI initiator in a bhyve guest. To determine whether iSCSI is really involved, I reconfigured to import the volume onto the bhyve host and pass it to the VM as a virtio block device. I failed to reproduce the hang. (I'm still importing with altroot.) Also, I have only reproduced the hang on systems with debugging kernel options such as INVARIANTS and WITNESS. To determine whether they're really involved, I'm rebuilding my VM's kernel without these options. The debugging kernel options are not part of the problem. I removed them and still reproduced the hang. Steps to reproduce on a stable/11 r303878 GENERIC kernel (and many earlier revs): 1. Attach to an iSCSI LUN with /etc/iscsi.conf and iscsid. 2. Create a ZFS pool on that LUN: zpool create iscsi_test da0 3. Export the pool: zpool export iscsi_test 4. Import the pool with an altroot: zpool import -R /blah iscsi_test 5. shutdown -r now The system will hang after printing Uptime: XdYmZs. iSCSI and altroot are key to producing the hang. I noticed this too but not 100% reproducible. I don't have iSCSI setup, but do have zvol. It was a fresh -CURRENT. I just reproduced this on 10.3-STABLE r303633. I'll try to reproduce on 10.3-RELEASE to see if it would be a new regression in 11.0-RELEASE. I just reproduced this on 12-CURRENT r303626. I'm now updating that machine to the latest head. Please don't add -current or -stable to bugs like this; it spams the list unnecessarily (this issue impacts users of iSCSI + ZFS -- which seems a bit niche right now) On my system ctld is enabled but there are no clients. The bug persists with ctld disabled, too. Also there is a bhyve-VM with passthrough of an onboard NIC but this doesn't affect the bug. The root filesystem is ZFS. Summing up, no iSCSI, no altroot, ZFS on root. I could normally reboot and shutdown the system on 10.3-RELEASE and 10.3-STABLE before upgrading to 11.0-BETA1, which is when I noticed the bug. No kernel panic happens though. Could I get more verbose logging during shutdown to see what is going on? Most strikingly, after the system "hangs" on "Uptime ..." I can still successfully ping one of the onboard ifaces, not the VT-d one. This bug is not limited to iSCSI. I have updated the summary accordingly. I could NOT reproduce this on 10.3-RELEASE, so this will be a new regression in 11.0-RELEASE. I can still reproduce it on head at r303895 (9 August). I can't spend any more time on this. I suggest that someone reproduce and bisect the commits between 10.3-RELEASE and 10-STABLE. I can't seem to be able to reproduce this anymore on -CURRENT (currently at r304072), FYI. I can still reproduce this on head at r304162. I too am having this issue, no iscsi, no altroot, just zvols. This is happening on 11.0-RELEASE, and on reboot the zpool complains that it was already imported on another system. This bug seems to be the same as "https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=167685" Setting hw.usb.no_shutdown_wait=1 resolves it. Strangely though if I manually export the zpool with 6 SATA drives I can reboot it, even with hw.usb.no_shutdown_wait=0 (the default value). It is unclear to me how the 6 SATA-drive zpool is connected with hw.usb.no_shutdown_wait. This is amongst the bugs that reportedly need special attention; see for example <https://lists.freebsd.org/archives/freebsd-fs/2023-April/002047.html>. Re: the opening poster's comment #20 I'll close this as a duplicate. *** This bug has been marked as a duplicate of bug 167685 *** |
Created attachment 173147 [details] dmesg System is an Supermicro X9SCM running FreeBSD 11.0-BETA3 amd64 with GENERIC kernel. When issuing "reboot" the system will hang after the "Uptime" message. The system has to be reset by physically holding the power button or issuing a power cycle from the IPMI interface. I am attaching the dmesg and loader.conf output.