Summary: | panic in glabel (g_label_destroy) stop after resizing GPT partition | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Andrew "RhodiumToad" Gierth <andrew> | ||||
Component: | kern | Assignee: | freebsd-geom (Nobody) <geom> | ||||
Status: | Closed FIXED | ||||||
Severity: | Affects Some People | CC: | admin, cem, markj, pawel.worach, yannk | ||||
Priority: | --- | Keywords: | crash, needs-qa | ||||
Version: | CURRENT | Flags: | koobs:
mfc-stable12?
koobs: mfc-stable11? |
||||
Hardware: | Any | ||||||
OS: | Any | ||||||
Attachments: |
|
Description
Andrew "RhodiumToad" Gierth
2019-04-13 21:03:59 UTC
Also confirmed on stable/12 r346169 The resize of the partition provokes these messages, which I had previously missed: g_access(958): provider gptid/0d17d86a-5edf-11e9-971a-00a0985beaef has error 6 set g_access(958): provider gptid/0d17d86a-5edf-11e9-971a-00a0985beaef has error 6 set g_access(958): provider gpt/tst0 has error 6 set g_access(958): provider gpt/tst0 has error 6 set g_dev_taste: make_dev_p() failed (gp->name=gpt/tst0, error=17) g_dev_taste: make_dev_p() failed (gp->name=gptid/0d17d86a-5edf-11e9-971a-00a0985beaef, error=17) On CURRENT I see: g_access(958): provider gptid/b2f6ac01-5ee0-11e9-b200-00a098d53cc9 has error 6 set g_access(958): provider gptid/b2f6ac01-5ee0-11e9-b200-00a098d53cc9 has error 6 set g_access(958): provider gptid/b2f6ac01-5ee0-11e9-b200-00a098d53cc9 has error 6 set g_access(958): provider gpt/tst0 has error 6 set g_access(958): provider gpt/tst0 has error 6 set g_access(958): provider gpt/tst0 has error 6 set g_dev_taste: make_dev_p() failed (gp->name=gpt/tst0, error=17) g_dev_taste: make_dev_p() failed (gp->name=gptid/b2f6ac01-5ee0-11e9-b200-00a098d53cc9, error=17) md0p1 resized # glabel status Name Status Components gpt/tst0 N/A N/A gptid/b2f6ac01-5ee0-11e9-b200-00a098d53cc9 N/A N/A gpt/tst0 N/A md0p1 gptid/b2f6ac01-5ee0-11e9-b200-00a098d53cc9 N/A md0p1 # glabel stop gpt/tst0 # glabel stop gpt/tst0 ... Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x0 fault code = supervisor read data , page not present ... current process = 13 (g_event) ... --- trap 0xc, rip = 0xffffffff80af008d, rsp = 0xfffffe00004ff980, rbp = 0xfffffe00004ff990 --- g_slice_spoiled() at g_slice_spoiled+0x1d/frame 0xfffffe00004ff990 g_label_config() at g_label_config+0x23f/frame 0xfffffe00004ffa10 g_ctl_req() at g_ctl_req+0x6e/frame 0xfffffe00004ffa30 g_run_events() at g_run_events+0xf5/frame 0xfffffe00004ffa70 fork_exit() at fork_exit+0x84/frame 0xfffffe00004ffab0 So issue exists in head as well. > g_label_destroy clearly isn't expecting the case where the label has no consumer at all.
This assumption may be ok, generally; it looks like the root of the problem is that "gpart resize" fails in an unacceptable way (doesn't clean up after itself).
(My repro from comment #2 is on a somewhat old CURRENT, FWIW: r345283 + some local changes. But I don't recall any gpart/glabel changes in recent history.) (In reply to Conrad Meyer from comment #3) The resize does actually succeed, not fail. (In reply to andrew from comment #5) > The resize does actually succeed, not fail. The failure is leaving behind the duplicate copy of "tst0." I think I have identified the sequence of events: 1. When we resize the partition, it spoils the attached LABEL consumers (correctly, I think, since those might depend on the partition size); 2. The spoiled LABELs are orphaned, orphaning the DEV geoms that are their sole consumers; 3. the DEV geoms destruct, detaching from the LABELs and removing the /dev entries; 4. BEFORE anything significant can happen, DEV re-tastes the now-orphaned LABEL geoms, getting an ENXIO from trying to open them, but for whatever reason attaches to them anyway, creating new /dev entries and attaching new consumers to the orphan LABELs; 5. Since the old LABELs now have attached consumers that aren't going to die, the withering process never completes, and the /dev entries remain attached to the orphaned LABEL geoms that now have no partition under them; 6. When DEV gets to taste the new providers for the new LABEL geoms for the resized partitions, it can't create the /dev entries for them because they already exist. You'd think that step 4 wouldn't happen because a withering geom shouldn't be offered for tasting, but it turns out there's a code path where this happens: and strangely enough it's in g_resize_provider_event. So there are several places here where questionable things are happening, even beyond the assumption that actually causes the crash (LABEL's assumption that it always has a consumer). I can no longer reproduce this on 13.x, possibly it was fixed by https://reviews.freebsd.org/D26658 ? (which afaict was not MFC'd despite the "MFC after" annotation) (In reply to andrew from comment #8) It was eventually MFCed into 13.0, for what that's worth. It's not obvious to me that that change fixed the problem though. This issue is reproducible in FreeBSD 12.3-STABLE. Will there be a patch for 12.x? # uname FreeBSD 12.3-STABLE #0 r372168M: Thu Jun 23 09:30:29 EEST 2022 Created attachment 235084 [details]
patch against stable/12
If it's reproducible and you can test custom kernels, please try the attached patch.
I can no longer reproduce this on 13-stable, so I'm closing it. |