| Summary: | panic in glabel (g_label_destroy) stop after resizing GPT partition | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Base System | Reporter: | Andrew "RhodiumToad" Gierth <andrew> | ||||
| Component: | kern | Assignee: | freebsd-geom (Nobody) <geom> | ||||
| Status: | Closed FIXED | ||||||
| Severity: | Affects Some People | CC: | admin, cem, markj, pawel.worach, yannk | ||||
| Priority: | --- | Keywords: | crash, needs-qa | ||||
| Version: | CURRENT | Flags: | koobs:
mfc-stable12?
koobs: mfc-stable11? |
||||
| Hardware: | Any | ||||||
| OS: | Any | ||||||
| Attachments: |
|
||||||
Also confirmed on stable/12 r346169 The resize of the partition provokes these messages, which I had previously missed: g_access(958): provider gptid/0d17d86a-5edf-11e9-971a-00a0985beaef has error 6 set g_access(958): provider gptid/0d17d86a-5edf-11e9-971a-00a0985beaef has error 6 set g_access(958): provider gpt/tst0 has error 6 set g_access(958): provider gpt/tst0 has error 6 set g_dev_taste: make_dev_p() failed (gp->name=gpt/tst0, error=17) g_dev_taste: make_dev_p() failed (gp->name=gptid/0d17d86a-5edf-11e9-971a-00a0985beaef, error=17) On CURRENT I see:
g_access(958): provider gptid/b2f6ac01-5ee0-11e9-b200-00a098d53cc9 has error 6 set
g_access(958): provider gptid/b2f6ac01-5ee0-11e9-b200-00a098d53cc9 has error 6 set
g_access(958): provider gptid/b2f6ac01-5ee0-11e9-b200-00a098d53cc9 has error 6 set
g_access(958): provider gpt/tst0 has error 6 set
g_access(958): provider gpt/tst0 has error 6 set
g_access(958): provider gpt/tst0 has error 6 set
g_dev_taste: make_dev_p() failed (gp->name=gpt/tst0, error=17)
g_dev_taste: make_dev_p() failed (gp->name=gptid/b2f6ac01-5ee0-11e9-b200-00a098d53cc9, error=17)
md0p1 resized
# glabel status
Name Status Components
gpt/tst0 N/A N/A
gptid/b2f6ac01-5ee0-11e9-b200-00a098d53cc9 N/A N/A
gpt/tst0 N/A md0p1
gptid/b2f6ac01-5ee0-11e9-b200-00a098d53cc9 N/A md0p1
# glabel stop gpt/tst0
# glabel stop gpt/tst0
...
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address = 0x0
fault code = supervisor read data , page not present
...
current process = 13 (g_event)
...
--- trap 0xc, rip = 0xffffffff80af008d, rsp = 0xfffffe00004ff980, rbp = 0xfffffe00004ff990 ---
g_slice_spoiled() at g_slice_spoiled+0x1d/frame 0xfffffe00004ff990
g_label_config() at g_label_config+0x23f/frame 0xfffffe00004ffa10
g_ctl_req() at g_ctl_req+0x6e/frame 0xfffffe00004ffa30
g_run_events() at g_run_events+0xf5/frame 0xfffffe00004ffa70
fork_exit() at fork_exit+0x84/frame 0xfffffe00004ffab0
So issue exists in head as well.
> g_label_destroy clearly isn't expecting the case where the label has no consumer at all.
This assumption may be ok, generally; it looks like the root of the problem is that "gpart resize" fails in an unacceptable way (doesn't clean up after itself).
(My repro from comment #2 is on a somewhat old CURRENT, FWIW: r345283 + some local changes. But I don't recall any gpart/glabel changes in recent history.) (In reply to Conrad Meyer from comment #3) The resize does actually succeed, not fail. (In reply to andrew from comment #5) > The resize does actually succeed, not fail. The failure is leaving behind the duplicate copy of "tst0." I think I have identified the sequence of events: 1. When we resize the partition, it spoils the attached LABEL consumers (correctly, I think, since those might depend on the partition size); 2. The spoiled LABELs are orphaned, orphaning the DEV geoms that are their sole consumers; 3. the DEV geoms destruct, detaching from the LABELs and removing the /dev entries; 4. BEFORE anything significant can happen, DEV re-tastes the now-orphaned LABEL geoms, getting an ENXIO from trying to open them, but for whatever reason attaches to them anyway, creating new /dev entries and attaching new consumers to the orphan LABELs; 5. Since the old LABELs now have attached consumers that aren't going to die, the withering process never completes, and the /dev entries remain attached to the orphaned LABEL geoms that now have no partition under them; 6. When DEV gets to taste the new providers for the new LABEL geoms for the resized partitions, it can't create the /dev entries for them because they already exist. You'd think that step 4 wouldn't happen because a withering geom shouldn't be offered for tasting, but it turns out there's a code path where this happens: and strangely enough it's in g_resize_provider_event. So there are several places here where questionable things are happening, even beyond the assumption that actually causes the crash (LABEL's assumption that it always has a consumer). I can no longer reproduce this on 13.x, possibly it was fixed by https://reviews.freebsd.org/D26658 ? (which afaict was not MFC'd despite the "MFC after" annotation) (In reply to andrew from comment #8) It was eventually MFCed into 13.0, for what that's worth. It's not obvious to me that that change fixed the problem though. This issue is reproducible in FreeBSD 12.3-STABLE. Will there be a patch for 12.x? # uname FreeBSD 12.3-STABLE #0 r372168M: Thu Jun 23 09:30:29 EEST 2022 Created attachment 235084 [details]
patch against stable/12
If it's reproducible and you can test custom kernels, please try the attached patch.
I can no longer reproduce this on 13-stable, so I'm closing it. |
Tested on stable/11 at r346167, but the backtrace below comes from an older stable/11 build (maybe 2 months back). To reproduce: md_unit=$(mdconfig -t swap -s 30MB) geom part create -s GPT "$md_unit" geom part add -s 10M -t linux-swap -l tst0 "$md_unit" geom part resize -i 1 -s 20M "$md_unit" # at this point "glabel status" shows two gpt/tst0 entries, # one of which has no consumer; trying to correct this causes # a panic: glabel stop gpt/tst0 glabel stop gpt/tst0 # BOOM Trace: #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:315 #2 0xffffffff80468255 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:383 #3 0xffffffff80468621 in vpanic (fmt=<optimized out>, ap=0xfffffe022c5c25f0) at /usr/src/sys/kern/kern_shutdown.c:776 #4 0xffffffff80468463 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:707 #5 0xffffffff80722faf in trap_fatal (frame=0xfffffe022c5c27e0, eva=0) at /usr/src/sys/amd64/amd64/trap.c:904 #6 0xffffffff80723009 in trap_pfault (frame=0xfffffe022c5c27e0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:744 #7 0xffffffff80722732 in trap (frame=0xfffffe022c5c27e0) at /usr/src/sys/amd64/amd64/trap.c:438 #8 <signal handler called> #9 g_slice_spoiled (cp=0x0) at /usr/src/sys/geom/geom_slice.c:511 #10 0xffffffff80f826d0 in g_label_destroy (force=<optimized out>, gp=<optimized out>) at /usr/src/sys/geom/label/g_label.c:267 #11 g_label_ctl_destroy (req=<optimized out>, mp=<optimized out>) at /usr/src/sys/geom/label/g_label.c:514 #12 g_label_config (req=0xfffffe0236bb67c0, mp=0xffffffff80f84fd8 <g_label_class>, verb=<optimized out>) at /usr/src/sys/geom/label/g_label.c:545 #13 0xffffffff803f9700 in one_event () at /usr/src/sys/geom/geom_event.c:264 #14 g_run_events () at /usr/src/sys/geom/geom_event.c:286 #15 0xffffffff804374d5 in fork_exit ( callout=0xffffffff803fb820 <g_event_procbody>, arg=0x0, frame=0xfffffe022c5c29c0) at /usr/src/sys/kern/kern_fork.c:1072 #16 <signal handler called> g_label_destroy clearly isn't expecting the case where the label has no consumer at all.