Bug 272151 - panic: use-after-free tty race condition
Summary: panic: use-after-free tty race condition
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Many People
Assignee: Robert Wing
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2023-06-22 16:30 UTC by Jake Freeland
Modified: 2023-12-27 12:33 UTC (History)
8 users (show)

See Also:
linimon: mfc-stable14?
linimon: mfc-stable13?


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jake Freeland 2023-06-22 16:30:42 UTC
There appears to be a race condition during shutdown where the tty is no longer owned by the current thread, resulting in an assertion panic. I was unable to dump for more information, but this panic has happened to me several times, so I will update the report with the dump info next time that it happens.

Here is what I was able to record:
Jun 20 22:11:22 freebsd shutdown[80834]: reboot by root:
Stopping cron.
Waiting for PIDS: 808.
Stopping sshd.
Waiting for PIDS: 804.
Stopping devd.
Waiting for PIDS: 491.
Writing entropy file: .
Writing early boot entropy file: .
.
Jun 20 22:11:22 freebsd syslogd: exiting on signal 15
panic: mutex ttymtx not owned at /usr/src/sys/kern/tty.c:720
cpuid = 1
time = 1687317082
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe018d934860
vpanic() at vpanic+0x150/frame 0xfffffe018d9348b0
panic() at panic+0x43/frame 0xfffffe018d934910
__mtx_assert() at __mtx_assert+0x9c/frame 0xfffffe018d934920
tty_kqops_read_event() at tty_kqops_read_event+0x2b/frame 0xfffffe018d934940
kqueue_register() at kqueue_register+0x8ee/frame 0xfffffe018d9349c0
kqueue_kevent() at kqueue_kevent+0x109/frame 0xfffffe018d934c90
kern_kevent_fp() at kern_kevent_fp+0x95/frame 0xfffffe018d934ce0
kern_kevent() at kern_kevent+0x80/frame 0xfffffe018d934d40
kern_kevent_generic() at kern_kevent_generic+0x6f/frame 0xfffffe018d934da0
sys_kevent() at sys_kevent+0x61/frame 0xfffffe018d934e00
amd64_syscall() at amd64_syscall+0x130/frame 0xfffffe018d934f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe018d934f30
--- syscall (560, FreeBSD ELF64, kevent), rip = 0x824b57b4a, rsp = 0x821235e38, rbp = 0x821235e80 ---
KDB: enter: panic
[ thread pid 2920 tid 100767 ]
Stopped at      kdb_enter+0x32: movq    $0,0xde1c73(%rip)
db> dump

Dump failed. Partition too small (about 2697MB were needed this time).
Cannot dump: unknown error (error=7).
Comment 1 Kyle Evans freebsd_committer freebsd_triage 2023-06-22 17:13:11 UTC
I guess the implication is that we're hitting https://cgit.freebsd.org/src/tree/sys/kern/kern_event.c#n1732 with a NULL kn->kn_knlist, in which case the previous kn_list_lock() was effectively a nop and we're not meeting the invariant described in kqueue(9):

     The knlist_*() family of functions are for managing knotes associated
     with an object.  A knlist is not required, but is commonly used.  If
     used, the knlist must be initialized with either knlist_init() or
     knlist_init_mtx().  The knlist structure may be embedded into the object
     structure.  *The lock will be held over f_event calls.*

Maybe dchagin@ or markj@ can comment a little further on this one.
Comment 2 Robert Wing freebsd_committer freebsd_triage 2023-06-25 23:39:13 UTC
The issue seems to be caused by knlist_clear() with the way it sets up the knote with EV_ONESHOT. The event for the knote is triggered after the TTY is revoked and the thread no longer holds the TTY lock when the knote event is called.

I'd halfway assume that knotes shouldn't be triggered if the TTY was revoked, which might look something like: https://people.freebsd.org/~rew/tf-revoke.patch

or maybe it makes sense to delete the knotes when the TTY is not opened? something like: https://people.freebsd.org/~rew/tty-knote.patch

or...some other behavior is expected? either way, both of the patches above prevented the panic from occurring.

To reproduce, spin up a vm and execute the following:

- launch nvim
- suspend nvim (ctrl-z)
- poweroff (panic)
Comment 3 Robert Wing freebsd_committer freebsd_triage 2023-08-26 18:43:20 UTC
https://reviews.freebsd.org/D41605

I'm not sure this is the best fix but, I'm guessing we are going to get more reports about this in the future as I've hit it randomly a few times now.
Comment 4 commit-hook freebsd_committer freebsd_triage 2023-12-19 00:49:00 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=acd5638e268a6706f6b7ad84947a8425e8d51ef7

commit acd5638e268a6706f6b7ad84947a8425e8d51ef7
Author:     Robert Wing <rew@FreeBSD.org>
AuthorDate: 2023-12-19 00:40:46 +0000
Commit:     Robert Wing <rew@FreeBSD.org>
CommitDate: 2023-12-19 00:40:46 +0000

    tty: delete knotes when TTY is revoked

    Do not clear knotes from the TTY until it gets dealloc'ed, unless the
    TTY is being revoked, in that case delete the knotes when closed is
    called on the TTY.

    When knotes are cleared from a knlist, those knotes become detached from
    the knlist. And when an event is triggered on a detached knote there
    isn't an associated knlist and therefore no lock will be taken when the
    event is triggered.

    This becomes a problem when a detached knote is triggered on a TTY since
    the mutex for a TTY is also used as the lock for its knlists. This
    scenario ends up calling the TTY event handlers without the TTY lock
    being held and tripping on asserts in the event handlers.

    PR:             272151
    Reviewed by:    kib, markj
    Differential Revision:  https://reviews.freebsd.org/D41605

 sys/kern/tty.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)
Comment 5 Mark Linimon freebsd_committer freebsd_triage 2023-12-27 12:33:00 UTC
^Triage: assign to committer that resolved; set possible MFC flags.