Bug 272089

Summary: FreeBSD -CURRENT panic: panic invalid slot (wg(4) related)
Product: Base System Reporter: Gordon Bergling <gbe>
Component: kernAssignee: Kyle Evans <kevans>
Status: Closed FIXED    
Severity: Affects Many People CC: emaste, grahamperrin, jhb, kevans, net
Priority: --- Keywords: crash
Version: CURRENTFlags: kevans: mfc-stable13+
kevans: mfc-stable12-
Hardware: Any   
OS: Any   
URL: https://reviews.freebsd.org/D40708
Attachments:
Description Flags
kernel panic on -CURRENT none

Description Gordon Bergling freebsd_committer freebsd_triage 2023-06-20 08:44:23 UTC
Created attachment 242886 [details]
kernel panic on -CURRENT

I am currently seeing this reproducible kernel panic on -CURRENT. I have set
debug.debugger_on_panic=0
so that the machine should reboot and create a core dump, but the kernel just
hangs.

It's a hyper-v based virtualized system, so I only can upload a screenshot of
the kernel panic.

The kernel panic can be triggered when running the kyua test suite as root, but
sometimes after the machine is idle for a few days the panic could also be seen.
Comment 1 Kyle Evans freebsd_committer freebsd_triage 2023-06-21 18:22:52 UTC
From the backtrace, it looks like MOD_LOAD is failing and so we MOD_UNLOAD it, but it's only partially initialized and we aren't expecting that. I suspect we should just drop all of the free stuff in wg_module_init() and let MOD_UNLOAD deal with it, though in wg_module_deinit() we also need to avoid calling osd_jail_deregister() if wg_osd_jail_slot == 0 (i.e. we didn't reach that far in initialization).
Comment 2 Kyle Evans freebsd_committer freebsd_triage 2023-06-21 18:25:54 UTC
the free_crypto label in there is actually wrong anyways, if cookie_init() failed we don't set `ret` to a failure so MOD_LOAD will succeed even though things are partially borked.
Comment 3 Gordon Bergling freebsd_committer freebsd_triage 2023-06-23 14:45:25 UTC
I have applied D40708 and haven't got the panic after 7 continuous kyua runs.
Comment 4 commit-hook freebsd_committer freebsd_triage 2023-06-23 17:01:29 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=b08ee10c0646e683cd03c9e28f537d9a7ba306af

commit b08ee10c0646e683cd03c9e28f537d9a7ba306af
Author:     Kyle Evans <kevans@FreeBSD.org>
AuthorDate: 2023-06-21 18:56:58 +0000
Commit:     Kyle Evans <kevans@FreeBSD.org>
CommitDate: 2023-06-23 17:00:09 +0000

    wg: fix a number of issues with module load failure handling

    If MOD_LOAD fails, then MOD_UNLOAD will be called to unwind module
    state, but wg_module_init() will have already deinitialized everything
    it needs to in a manner that renders it unsafe to call MOD_UNLOAD
    after (e.g., freed zone not reset to NULL, wg_osd_jail_slot not reset
    to 0).  Let's simply stop trying to handle freeing everything in
    wg_module_init() to simplify it; let the subsequent MOD_UNLOAD deal with
    it, and let's make that robust against partially-constructed state.

    jhb@ notes that MOD_UNLOAD being called if MOD_LOAD fails is kind of an
    anomaly that doesn't match other paradigms in the kernel; e.g., if
    device_attach() fails, we don't invoke device_detach().  It's likely
    that a future commit will revert this and instead stop calling
    MOD_UNLOAD if MOD_LOAD fails, expecting modules to clean up after
    themselves in MOD_LOAD upon failure.  Some other modules already do this
    and may see similar problems to the wg module (see: carp).  The proper
    fix is decidedly a bit too invasive to do this close to 14 branching,
    and it requires auditing all kmods (base + ports) for potential leaks.

    PR:             272089
    Reviewed by:    emaste
    MFC after:      3 days
    Differential Revision:  https://reviews.freebsd.org/D40708

 sys/dev/wg/if_wg.c     | 28 +++++++++-------------------
 sys/dev/wg/wg_cookie.c |  9 ++++++++-
 2 files changed, 17 insertions(+), 20 deletions(-)
Comment 5 commit-hook freebsd_committer freebsd_triage 2023-07-12 03:11:17 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=b39545f7bc38ec9865ac9cb4e703a6344ad8310b

commit b39545f7bc38ec9865ac9cb4e703a6344ad8310b
Author:     Kyle Evans <kevans@FreeBSD.org>
AuthorDate: 2023-06-21 18:56:58 +0000
Commit:     Kyle Evans <kevans@FreeBSD.org>
CommitDate: 2023-07-11 15:05:45 +0000

    wg: fix a number of issues with module load failure handling

    If MOD_LOAD fails, then MOD_UNLOAD will be called to unwind module
    state, but wg_module_init() will have already deinitialized everything
    it needs to in a manner that renders it unsafe to call MOD_UNLOAD
    after (e.g., freed zone not reset to NULL, wg_osd_jail_slot not reset
    to 0).  Let's simply stop trying to handle freeing everything in
    wg_module_init() to simplify it; let the subsequent MOD_UNLOAD deal with
    it, and let's make that robust against partially-constructed state.

    jhb@ notes that MOD_UNLOAD being called if MOD_LOAD fails is kind of an
    anomaly that doesn't match other paradigms in the kernel; e.g., if
    device_attach() fails, we don't invoke device_detach().  It's likely
    that a future commit will revert this and instead stop calling
    MOD_UNLOAD if MOD_LOAD fails, expecting modules to clean up after
    themselves in MOD_LOAD upon failure.  Some other modules already do this
    and may see similar problems to the wg module (see: carp).  The proper
    fix is decidedly a bit too invasive to do this close to 14 branching,
    and it requires auditing all kmods (base + ports) for potential leaks.

    PR:             272089
    Reviewed by:    emaste

    (cherry picked from commit b08ee10c0646e683cd03c9e28f537d9a7ba306af)

 sys/dev/wg/if_wg.c     | 28 +++++++++-------------------
 sys/dev/wg/wg_cookie.c |  9 ++++++++-
 2 files changed, 17 insertions(+), 20 deletions(-)
Comment 6 Kyle Evans freebsd_committer freebsd_triage 2023-07-12 03:13:17 UTC
Thanks for the report!