Summary: | FreeBSD -CURRENT panic: panic invalid slot (wg(4) related) | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Gordon Bergling <gbe> | ||||
Component: | kern | Assignee: | Kyle Evans <kevans> | ||||
Status: | Closed FIXED | ||||||
Severity: | Affects Many People | CC: | emaste, grahamperrin, jhb, kevans, net | ||||
Priority: | --- | Keywords: | crash | ||||
Version: | CURRENT | Flags: | kevans:
mfc-stable13+
kevans: mfc-stable12- |
||||
Hardware: | Any | ||||||
OS: | Any | ||||||
URL: | https://reviews.freebsd.org/D40708 | ||||||
Attachments: |
|
From the backtrace, it looks like MOD_LOAD is failing and so we MOD_UNLOAD it, but it's only partially initialized and we aren't expecting that. I suspect we should just drop all of the free stuff in wg_module_init() and let MOD_UNLOAD deal with it, though in wg_module_deinit() we also need to avoid calling osd_jail_deregister() if wg_osd_jail_slot == 0 (i.e. we didn't reach that far in initialization). the free_crypto label in there is actually wrong anyways, if cookie_init() failed we don't set `ret` to a failure so MOD_LOAD will succeed even though things are partially borked. I have applied D40708 and haven't got the panic after 7 continuous kyua runs. A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=b08ee10c0646e683cd03c9e28f537d9a7ba306af commit b08ee10c0646e683cd03c9e28f537d9a7ba306af Author: Kyle Evans <kevans@FreeBSD.org> AuthorDate: 2023-06-21 18:56:58 +0000 Commit: Kyle Evans <kevans@FreeBSD.org> CommitDate: 2023-06-23 17:00:09 +0000 wg: fix a number of issues with module load failure handling If MOD_LOAD fails, then MOD_UNLOAD will be called to unwind module state, but wg_module_init() will have already deinitialized everything it needs to in a manner that renders it unsafe to call MOD_UNLOAD after (e.g., freed zone not reset to NULL, wg_osd_jail_slot not reset to 0). Let's simply stop trying to handle freeing everything in wg_module_init() to simplify it; let the subsequent MOD_UNLOAD deal with it, and let's make that robust against partially-constructed state. jhb@ notes that MOD_UNLOAD being called if MOD_LOAD fails is kind of an anomaly that doesn't match other paradigms in the kernel; e.g., if device_attach() fails, we don't invoke device_detach(). It's likely that a future commit will revert this and instead stop calling MOD_UNLOAD if MOD_LOAD fails, expecting modules to clean up after themselves in MOD_LOAD upon failure. Some other modules already do this and may see similar problems to the wg module (see: carp). The proper fix is decidedly a bit too invasive to do this close to 14 branching, and it requires auditing all kmods (base + ports) for potential leaks. PR: 272089 Reviewed by: emaste MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D40708 sys/dev/wg/if_wg.c | 28 +++++++++------------------- sys/dev/wg/wg_cookie.c | 9 ++++++++- 2 files changed, 17 insertions(+), 20 deletions(-) A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=b39545f7bc38ec9865ac9cb4e703a6344ad8310b commit b39545f7bc38ec9865ac9cb4e703a6344ad8310b Author: Kyle Evans <kevans@FreeBSD.org> AuthorDate: 2023-06-21 18:56:58 +0000 Commit: Kyle Evans <kevans@FreeBSD.org> CommitDate: 2023-07-11 15:05:45 +0000 wg: fix a number of issues with module load failure handling If MOD_LOAD fails, then MOD_UNLOAD will be called to unwind module state, but wg_module_init() will have already deinitialized everything it needs to in a manner that renders it unsafe to call MOD_UNLOAD after (e.g., freed zone not reset to NULL, wg_osd_jail_slot not reset to 0). Let's simply stop trying to handle freeing everything in wg_module_init() to simplify it; let the subsequent MOD_UNLOAD deal with it, and let's make that robust against partially-constructed state. jhb@ notes that MOD_UNLOAD being called if MOD_LOAD fails is kind of an anomaly that doesn't match other paradigms in the kernel; e.g., if device_attach() fails, we don't invoke device_detach(). It's likely that a future commit will revert this and instead stop calling MOD_UNLOAD if MOD_LOAD fails, expecting modules to clean up after themselves in MOD_LOAD upon failure. Some other modules already do this and may see similar problems to the wg module (see: carp). The proper fix is decidedly a bit too invasive to do this close to 14 branching, and it requires auditing all kmods (base + ports) for potential leaks. PR: 272089 Reviewed by: emaste (cherry picked from commit b08ee10c0646e683cd03c9e28f537d9a7ba306af) sys/dev/wg/if_wg.c | 28 +++++++++------------------- sys/dev/wg/wg_cookie.c | 9 ++++++++- 2 files changed, 17 insertions(+), 20 deletions(-) Thanks for the report! |
Created attachment 242886 [details] kernel panic on -CURRENT I am currently seeing this reproducible kernel panic on -CURRENT. I have set debug.debugger_on_panic=0 so that the machine should reboot and create a core dump, but the kernel just hangs. It's a hyper-v based virtualized system, so I only can upload a screenshot of the kernel panic. The kernel panic can be triggered when running the kyua test suite as root, but sometimes after the machine is idle for a few days the panic could also be seen.