Summary: | zvol: unbootable system with root on zfs | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Evilham <contact> | ||||||
Component: | kern | Assignee: | freebsd-fs (Nobody) <fs> | ||||||
Status: | Open --- | ||||||||
Severity: | Affects Only Me | CC: | avg, contact, pi | ||||||
Priority: | --- | Keywords: | regression | ||||||
Version: | CURRENT | ||||||||
Hardware: | Any | ||||||||
OS: | Any | ||||||||
URL: | https://reviews.freebsd.org/D23478 | ||||||||
Attachments: |
|
Description
Evilham
2020-06-25 18:29:35 UTC
Added avg to CC since they might still have this in memory / have more hints as to what to check or how to reproduce this. Do you have swap on zvol ? (In reply to Andriy Gapon from comment #2) I don't have swap on zvol, sorry for not sending more complete information in the first place. Please let me know if I can help provide more information. # This partitioning scheme was created by the installer: # gpart show => 40 1000215136 nda0 GPT (477G) 40 409600 1 efi (200M) 409640 1024 2 freebsd-boot (512K) 410664 984 - free - (492K) 411648 4194304 3 freebsd-swap (2.0G) 4605952 995608576 4 freebsd-zfs (475G) 1000214528 648 - free - (324K) # This zpool was also created by the installer # zpool status zroot pool: zroot state: ONLINE status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details. scan: scrub repaired 0 in 0 days 00:14:28 with 0 errors on Thu Dec 19 11:37:05 2019 checkpoint: created Thu Jun 25 18:13:14 2020, consumes 586M config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 nda0p4.eli ONLINE 0 0 0 errors: No known data errors # This is the only zvol in the 'zroot' zpool # zfs list zroot/things NAME USED AVAIL REFER MOUNTPOINT zroot/things 2,79G 17,2G 265M - # The zroot/things zvol is actually geli-encrypted with a zpool in it # the 'nested' zpool shouldn't be relevant for the boot sequence though as # this zvol is decrypted only manually # geli attach /dev/zvol/zroot/things # zpool status things pool: things state: ONLINE status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details. scan: none requested config: NAME STATE READ WRITE CKSUM things ONLINE 0 0 0 zvol/zroot/things.eli ONLINE 0 0 0 errors: No known data errors Can you try pressing Ctrl-T when it hang? Is there any interesting output? Can Ctrl-C interrupt it and allow the startup to proceed? If yes, then it would be easier to get more information. (In reply to Andriy Gapon from comment #4) Hello, sorry for the delay. ^C does not help the boot sequence to move on, ^T does give out some information (transcription): load: 0.26 cmd: sh 7099 [piperd] 172.27r 0.00u 0.00s 0% 2888k #0 0xffffffff80be6331 at mi_switch+0xc1 #1 0xffffffff80c36551 at sleepq_catch_signals+0x3d1 #2 0xffffffff80c36169 at sleepq_wait_sig+0x9 #3 0xffffffff80be5a4e at _sleep+0x1be #4 0xffffffff80c49c9a at pipe_read+0x43a #5 0xffffffff80c46961 at dofileread+0x81 #6 0xffffffff80c464e0 at sys_read+0xc0 #7 0xffffffff8106f089 at amd64_syscall+0x119 #8 0xffffffff81045630 at fast_syscall_common+0x101 It doesn't seem to change after some time either. I tried manually running service zvol start from a single-user session and it doesn't hang but exits successfully instead. Can you get a crash dump? E.g. via Ctrl+Alt+Esc or Ctrl+SysRq ? Note that if you can, then I will ask you to run some kgdb commands on it. I do not need a crash dump file (vmcore.*). (In reply to Andriy Gapon from comment #6) Hello, thank you for the reply, I had totally missed it, sorry. Just gave it a go, by modifying /etc/rc.d/zvol to REQUIRE dumpon I was able to get the prompt, create a dump and get a savecore after booting the system normally. I'd be delighted to run kgdb on the crash dump as requested. Perhaps it's interesting to mention that I have dropped my reverting of the mentioned revisions, and am running HEAD without patches; so far I haven't noticed anything off. Setting rc_debug="YES" in /etc/rc.conf is what allows me to boot the system and now it's easier for me to test potential fixes since it's just a matter of (un)commenting that in single-user mode. Dropping /etc/rc.d/zvol *without* rc_debug="YES" resulted in the hang moving to "Mounting filesystems...", so it's a tad misleading that I'm seeing this with /etc/rc.d/zvol, according to service -e it just happens to be the second rc script (first one being hostid). I tried reproducing this under bhyve / with a different hard drive and an "identical setup" (as much as I was able to) but was not successful at it; maybe it has something to do with the nvme being quick enough for something to get deadlocked, and also that'd be why rc_debug="YES" allows me to boot normally. (In reply to Evilham from comment #7) Sorry for the delay. Have you got a core.txt file corresponding to the core (in /var/crash) ? If yes, could you please share it? Thanks! Created attachment 216460 [details]
core.txt
Here you go!
I've been testing the Matt's OpenZFS and haven't upgraded the pool, so I can rollback and keep helping debug this since non-CURRENT could be affected.
Thank you for looking into it!
(In reply to Evilham from comment #9) Could you please open the core in kgdb, run the following commands and share the resulting gdb.txt: set pagination off set logging on set logging redirect on thread apply all bt <this may take a while to complete> quit Created attachment 216546 [details]
gdb.txt
I made a mistake and deleted the BE where the core.txt came from, so this was result of recompiling >,<.
Please let me know if the matching core.txt is necessary.
(In reply to Evilham from comment #11) That's not a problem. Thank you very much for the data. I think that I see a deadlock. It seems to happen as a result of a race condition where two unrelated events can cause the deadlock if they happen at the same time. I will need some more time to analyze the problem and to, hopefully, come up with a fix. The threads involved: Thread 815 (Thread 100046): #0 sched_switch (td=0xfffffe003a5b7100, flags=<optimized out>) at /freebsd/src/sys/kern/sched_ule.c:2147 #1 0xffffffff80be1481 in mi_switch (flags=260) at /freebsd/src/sys/kern/kern_synch.c:542 #2 0xffffffff80bdf761 in _sx_xlock_hard (sx=0xffffffff822bea08 <spa_namespace_lock>, x=<optimized out>, opts=<optimized out>) at /freebsd/src/sys/kern/kern_sx.c:858 #3 0xffffffff8221062b in __sx_xlock (sx=<optimized out>, td=0xfffffe003a5b7100, opts=<error reading variable: Cannot access memory at address 0x0>, file=<optimized out>, line=<optimized out>) at /freebsd/src/sys/sys/sx.h:168 #4 zvol_close (pp=0xfffff8002710e700, flag=<optimized out>, count=<optimized out>) at /freebsd/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c:1270 #5 zvol_geom_access (pp=0xfffff8002710e700, acr=<optimized out>, acw=0, ace=<optimized out>) at /freebsd/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c:2753 #6 0xffffffff80b1f304 in g_access (cp=0xfffff8003da02580, dcr=<optimized out>, dcw=<optimized out>, dce=0) at /freebsd/src/sys/geom/geom_subr.c:1041 #7 0xffffffff80b22586 in g_label_taste (mp=0xffffffff818f2ac0 <g_label_class>, pp=<optimized out>, flags=<optimized out>) at /freebsd/src/sys/geom/label/g_label.c:411 #8 0xffffffff80b1faaa in g_new_provider_event (arg=0xfffff8002710e700, flag=<optimized out>) at /freebsd/src/sys/geom/geom_subr.c:596 #9 0xffffffff80b1acee in one_event () at /freebsd/src/sys/geom/geom_event.c:265 #10 g_run_events () at /freebsd/src/sys/geom/geom_event.c:288 #11 0xffffffff80b9360e in fork_exit (callout=0xffffffff80b1d240 <g_event_procbody>, arg=0x0, frame=0xfffffe003a809c00) at /freebsd/src/sys/kern/kern_fork.c:1052 Thread 322 (Thread 100542): #0 sched_switch (td=0xfffffe00dca8c000, flags=<optimized out>) at /freebsd/src/sys/kern/sched_ule.c:2147 #1 0xffffffff80be1481 in mi_switch (flags=260) at /freebsd/src/sys/kern/kern_synch.c:542 #2 0xffffffff80be0bab in _sleep (ident=0xfffff8002710e800, lock=<optimized out>, priority=0, wmesg=0xffffffff811b9f44 "gtopol", sbt=0, pr=<optimized out>, flags=256) at /freebsd/src/sys/kern/kern_synch.c:221 #3 0xffffffff80b1f207 in g_access (cp=0xfffff8003d6b3480, dcr=1, dcw=0, dce=1) at /freebsd/src/sys/geom/geom_subr.c:1000 #4 0xffffffff8221277a in vdev_geom_attach (pp=0xfffff8002710e700, vd=0x0, sanity=1) at /freebsd/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c:289 #5 0xffffffff822149b9 in vdev_attach_ok (vd=0xfffff8003ddfa000, pp=0xfffff8002710e700) at /freebsd/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c:661 #6 0xffffffff822139fa in vdev_geom_attach_by_guids (vd=0xfffff8003ddfa000) at /freebsd/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c:735 #7 vdev_geom_open_by_guids (vd=0xfffff8003ddfa000) at /freebsd/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c:772 #8 vdev_geom_open (vd=0xfffff8003ddfa000, psize=0xfffffe00dd1342d8, max_psize=0xfffffe00dd1342e0, logical_ashift=<optimized out>, physical_ashift=0xfffffe00dd1342d0) at /freebsd/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c:877 #9 0xffffffff821a5081 in vdev_open (vd=0xfffff8003ddfa000) at /freebsd/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:1696 #10 0xffffffff821a4f29 in vdev_open_children (vd=0xfffff8003dd6a000) at /freebsd/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:1620 #11 0xffffffff821bbc1a in vdev_root_open (vd=0xfffff8003dd6a000, asize=0xfffffe00dd1343b8, max_asize=0xfffffe00dd1343c0, logical_ashift=0xfffffe00dd1343a8, physical_ashift=0xfffffe00dd1343b0) at /freebsd/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_root.c:96 #12 0xffffffff821a5081 in vdev_open (vd=0xfffff8003dd6a000) at /freebsd/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:1696 #13 0xffffffff82195abc in spa_ld_open_vdevs (spa=0xfffffe00dddfd000) at /freebsd/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:2921 #14 0xffffffff82194309 in spa_ld_mos_init (spa=0xfffffe00dddfd000, type=SPA_IMPORT_EXISTING) at /freebsd/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:3970 #15 0xffffffff82194099 in spa_ld_mos_with_trusted_config (spa=0xfffffe00dddfd000, type=SPA_IMPORT_EXISTING, update_config_cache=0xfffffe00dd13461c) at /freebsd/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:4113 #16 0xffffffff8218d2e9 in spa_load_impl (spa=0xfffffe00dddfd000, type=SPA_IMPORT_EXISTING, ereport=<optimized out>) at /freebsd/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:4167 #17 spa_load (spa=0xfffffe00dddfd000, state=<optimized out>, type=SPA_IMPORT_EXISTING) at /freebsd/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:2433 #18 0xffffffff8218cbd5 in spa_load_best (spa=0xfffffe00dddfd000, state=SPA_LOAD_OPEN, max_request=<optimized out>, rewind_flags=1) at /freebsd/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:4441 #19 0xffffffff82189304 in spa_open_common (pool=0xfffffe00d11f7000 "servers", spapp=0xfffffe00dd134748, tag=0xffffffff8223fb89, nvpolicy=<optimized out>, config=0x0) at /freebsd/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:4576 #20 0xffffffff821eeae9 in pool_status_check (name=<unavailable>, type=<optimized out>, check=POOL_CHECK_SUSPENDED) at /freebsd/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:6804 #21 0xffffffff821f86c1 in zfsdev_ioctl (dev=<optimized out>, zcmd=<optimized out>, arg=<optimized out>, flag=<optimized out>, td=<optimized out>) at /freebsd/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:7117 #22 0xffffffff80a80007 in devfs_ioctl (ap=0xfffffe00dd134878) at /freebsd/src/sys/fs/devfs/devfs_vnops.c:824 #23 0xffffffff80cbdab9 in vn_ioctl (fp=0xfffff8003d61df50, com=<optimized out>, data=0xfffffe00dd134a20, active_cred=0xfffff8003b9a9900, td=<unavailable>) at /freebsd/src/sys/kern/vfs_vnops.c:1588 #24 0xffffffff80a8063e in devfs_ioctl_f (fp=<unavailable>, com=<unavailable>, data=<unavailable>, cred=<unavailable>, td=0xfffffe00dca8c000) at /freebsd/src/sys/fs/devfs/devfs_vnops.c:755 #25 0xffffffff80c42dce in fo_ioctl (fp=0xfffff8003d61df50, com=<optimized out>, data=<unavailable>, active_cred=<unavailable>, td=0xfffffe00dca8c000) at /freebsd/src/sys/sys/file.h:339 #26 kern_ioctl (td=<unavailable>, fd=3, com=<optimized out>, data=<unavailable>) at /freebsd/src/sys/kern/sys_generic.c:802 #27 0xffffffff80c42aca in sys_ioctl (td=0xfffffe00dca8c000, uap=0xfffffe00dca8c3e8) at /freebsd/src/sys/kern/sys_generic.c:710 #28 0xffffffff80ff5569 in syscallenter (td=0xfffffe00dca8c000) at /freebsd/src/sys/amd64/amd64/../../kern/subr_syscall.c:162 #29 amd64_syscall (td=0xfffffe00dca8c000, traced=0) at /freebsd/src/sys/amd64/amd64/trap.c:1161 This is a more subtle variation of a known FreeBSD problem with pools on top of zvols. The deadlock is between GEOM and ZFS's spa_namespace_lock. By the way, what vfs.zfs.vol.recursive is set to on that system? Yay! We found a bug! # sysctl vfs.zfs.vol.recursive vfs.zfs.vol.recursive: 0 Just out of curiosity and because I haven't seen a mention of it: do you know how support for ZFS in the other branches is going to be maintained after OpenZFS lands in CURRENT? Because e.g. this doesn't show up with OpenZFS and I guess spotting future issues before MFC is going to be trickier. Anyway, good thing that this starts making sense, thank you! (In reply to Evilham from comment #13) I don't know for sure. But my guess that ZFS in 12/11 is going to be mostly frozen save for bug fixed. I do not expect new features, etc. Those who need them will be encouraged to use the port. |