Tested on stable/11 at r346167, but the backtrace below comes from an older stable/11 build (maybe 2 months back). To reproduce: md_unit=$(mdconfig -t swap -s 30MB) geom part create -s GPT "$md_unit" geom part add -s 10M -t linux-swap -l tst0 "$md_unit" geom part resize -i 1 -s 20M "$md_unit" # at this point "glabel status" shows two gpt/tst0 entries, # one of which has no consumer; trying to correct this causes # a panic: glabel stop gpt/tst0 glabel stop gpt/tst0 # BOOM Trace: #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:315 #2 0xffffffff80468255 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:383 #3 0xffffffff80468621 in vpanic (fmt=<optimized out>, ap=0xfffffe022c5c25f0) at /usr/src/sys/kern/kern_shutdown.c:776 #4 0xffffffff80468463 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:707 #5 0xffffffff80722faf in trap_fatal (frame=0xfffffe022c5c27e0, eva=0) at /usr/src/sys/amd64/amd64/trap.c:904 #6 0xffffffff80723009 in trap_pfault (frame=0xfffffe022c5c27e0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:744 #7 0xffffffff80722732 in trap (frame=0xfffffe022c5c27e0) at /usr/src/sys/amd64/amd64/trap.c:438 #8 <signal handler called> #9 g_slice_spoiled (cp=0x0) at /usr/src/sys/geom/geom_slice.c:511 #10 0xffffffff80f826d0 in g_label_destroy (force=<optimized out>, gp=<optimized out>) at /usr/src/sys/geom/label/g_label.c:267 #11 g_label_ctl_destroy (req=<optimized out>, mp=<optimized out>) at /usr/src/sys/geom/label/g_label.c:514 #12 g_label_config (req=0xfffffe0236bb67c0, mp=0xffffffff80f84fd8 <g_label_class>, verb=<optimized out>) at /usr/src/sys/geom/label/g_label.c:545 #13 0xffffffff803f9700 in one_event () at /usr/src/sys/geom/geom_event.c:264 #14 g_run_events () at /usr/src/sys/geom/geom_event.c:286 #15 0xffffffff804374d5 in fork_exit ( callout=0xffffffff803fb820 <g_event_procbody>, arg=0x0, frame=0xfffffe022c5c29c0) at /usr/src/sys/kern/kern_fork.c:1072 #16 <signal handler called> g_label_destroy clearly isn't expecting the case where the label has no consumer at all.
Also confirmed on stable/12 r346169 The resize of the partition provokes these messages, which I had previously missed: g_access(958): provider gptid/0d17d86a-5edf-11e9-971a-00a0985beaef has error 6 set g_access(958): provider gptid/0d17d86a-5edf-11e9-971a-00a0985beaef has error 6 set g_access(958): provider gpt/tst0 has error 6 set g_access(958): provider gpt/tst0 has error 6 set g_dev_taste: make_dev_p() failed (gp->name=gpt/tst0, error=17) g_dev_taste: make_dev_p() failed (gp->name=gptid/0d17d86a-5edf-11e9-971a-00a0985beaef, error=17)
On CURRENT I see: g_access(958): provider gptid/b2f6ac01-5ee0-11e9-b200-00a098d53cc9 has error 6 set g_access(958): provider gptid/b2f6ac01-5ee0-11e9-b200-00a098d53cc9 has error 6 set g_access(958): provider gptid/b2f6ac01-5ee0-11e9-b200-00a098d53cc9 has error 6 set g_access(958): provider gpt/tst0 has error 6 set g_access(958): provider gpt/tst0 has error 6 set g_access(958): provider gpt/tst0 has error 6 set g_dev_taste: make_dev_p() failed (gp->name=gpt/tst0, error=17) g_dev_taste: make_dev_p() failed (gp->name=gptid/b2f6ac01-5ee0-11e9-b200-00a098d53cc9, error=17) md0p1 resized # glabel status Name Status Components gpt/tst0 N/A N/A gptid/b2f6ac01-5ee0-11e9-b200-00a098d53cc9 N/A N/A gpt/tst0 N/A md0p1 gptid/b2f6ac01-5ee0-11e9-b200-00a098d53cc9 N/A md0p1 # glabel stop gpt/tst0 # glabel stop gpt/tst0 ... Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x0 fault code = supervisor read data , page not present ... current process = 13 (g_event) ... --- trap 0xc, rip = 0xffffffff80af008d, rsp = 0xfffffe00004ff980, rbp = 0xfffffe00004ff990 --- g_slice_spoiled() at g_slice_spoiled+0x1d/frame 0xfffffe00004ff990 g_label_config() at g_label_config+0x23f/frame 0xfffffe00004ffa10 g_ctl_req() at g_ctl_req+0x6e/frame 0xfffffe00004ffa30 g_run_events() at g_run_events+0xf5/frame 0xfffffe00004ffa70 fork_exit() at fork_exit+0x84/frame 0xfffffe00004ffab0 So issue exists in head as well.
> g_label_destroy clearly isn't expecting the case where the label has no consumer at all. This assumption may be ok, generally; it looks like the root of the problem is that "gpart resize" fails in an unacceptable way (doesn't clean up after itself).
(My repro from comment #2 is on a somewhat old CURRENT, FWIW: r345283 + some local changes. But I don't recall any gpart/glabel changes in recent history.)
(In reply to Conrad Meyer from comment #3) The resize does actually succeed, not fail.
(In reply to andrew from comment #5) > The resize does actually succeed, not fail. The failure is leaving behind the duplicate copy of "tst0."
I think I have identified the sequence of events: 1. When we resize the partition, it spoils the attached LABEL consumers (correctly, I think, since those might depend on the partition size); 2. The spoiled LABELs are orphaned, orphaning the DEV geoms that are their sole consumers; 3. the DEV geoms destruct, detaching from the LABELs and removing the /dev entries; 4. BEFORE anything significant can happen, DEV re-tastes the now-orphaned LABEL geoms, getting an ENXIO from trying to open them, but for whatever reason attaches to them anyway, creating new /dev entries and attaching new consumers to the orphan LABELs; 5. Since the old LABELs now have attached consumers that aren't going to die, the withering process never completes, and the /dev entries remain attached to the orphaned LABEL geoms that now have no partition under them; 6. When DEV gets to taste the new providers for the new LABEL geoms for the resized partitions, it can't create the /dev entries for them because they already exist. You'd think that step 4 wouldn't happen because a withering geom shouldn't be offered for tasting, but it turns out there's a code path where this happens: and strangely enough it's in g_resize_provider_event. So there are several places here where questionable things are happening, even beyond the assumption that actually causes the crash (LABEL's assumption that it always has a consumer).
I can no longer reproduce this on 13.x, possibly it was fixed by https://reviews.freebsd.org/D26658 ? (which afaict was not MFC'd despite the "MFC after" annotation)
(In reply to andrew from comment #8) It was eventually MFCed into 13.0, for what that's worth. It's not obvious to me that that change fixed the problem though.
This issue is reproducible in FreeBSD 12.3-STABLE. Will there be a patch for 12.x? # uname FreeBSD 12.3-STABLE #0 r372168M: Thu Jun 23 09:30:29 EEST 2022
Created attachment 235084 [details] patch against stable/12 If it's reproducible and you can test custom kernels, please try the attached patch.
I can no longer reproduce this on 13-stable, so I'm closing it.