Bug 262587 - panic: lock (sleep mutex) umtxql not locked @ /usr/src/sys/sys/umtxvar.h:262 on 14-current master-n253798-8cdecdecb43
Summary: panic: lock (sleep mutex) umtxql not locked @ /usr/src/sys/sys/umtxvar.h:262 ...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-03-16 08:07 UTC by Masachika ISHIZUKA
Modified: 2022-11-12 12:42 UTC (History)
5 users (show)

See Also:


Attachments
/var/crash/textdump.tar.last (18.50 KB, application/x-xz)
2022-03-16 08:07 UTC, Masachika ISHIZUKA
no flags Details
core.txt (28.06 KB, application/x-xz)
2022-03-19 00:14 UTC, Masachika ISHIZUKA
no flags Details
Attached core.txt.3 file is output of patched linux_futex.c (27.45 KB, application/x-xz)
2022-03-19 00:43 UTC, Masachika ISHIZUKA
no flags Details
proposed patch (2.04 KB, patch)
2022-03-21 16:05 UTC, Mark Johnston
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Masachika ISHIZUKA 2022-03-16 08:07:22 UTC
Created attachment 232481 [details]
/var/crash/textdump.tar.last

It was working on master-n253404-05c3e8e8714.
Today, I updated to master-n253798-8cdecdecb43, it panicked with 'Panic String: lock (sleep mutex) umtxql not locked @ /usr/src/sys/sys/umtxvar.h:262' after starting firefox.
Comment 1 Graham Perrin freebsd_committer freebsd_triage 2022-03-16 09:52:38 UTC
(In reply to Masachika ISHIZUKA from comment #0)

> … updated to master-n253798-8cdecdecb43, it panicked …

<https://cgit.freebsd.org/src/log/?qt=range&q=8cdecdecb43> 
    2022-03-16 00:48:53 +0000

If it helps to bisect: I'm kernel panic-free with very heavy use of Firefox on: 
 
<https://cgit.freebsd.org/src/log/?qt=range&q=d5ad1713cc3>
    2022-03-14 08:38:04 +0000
Comment 2 Mark Johnston freebsd_committer freebsd_triage 2022-03-17 16:06:14 UTC
(In reply to Masachika ISHIZUKA from comment #0)
The panic is:

panic: lock (sleep mutex) umtxql not locked @ /usr/src/sys/sys/umtxvar.h:262                                                                                                                                                                                                            
cpuid = 2                                                                                                                                                                                                                                                                               
time = 1647416763                                                                                                                                                                                                                                                                       
KDB: stack backtrace:                                                                                                                                                                                                                                                                   
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0101153c00                                                                                                                                                                                                          
vpanic() at vpanic+0x17f/frame 0xfffffe0101153c50                                                                                                                                                                                                                                       
panic() at panic+0x43/frame 0xfffffe0101153cb0                                                                                                                                                                                                                                          
witness_unlock() at witness_unlock+0x144/frame 0xfffffe0101153cf0                                                                                                                                                                                                                       
__mtx_unlock_flags() at __mtx_unlock_flags+0x4d/frame 0xfffffe0101153d20                                                                                                                                                                                                                
linux_sys_futex() at linux_sys_futex+0x5e1/frame 0xfffffe0101153e00                                                                                                                                                                                                                     
amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe0101153f30                                                                                                                                                                                                                         
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0101153f30                                                                                                                                                                                                              
--- syscall (202, Linux ELF64, linux_sys_futex), rip = 0x803a0ba35, rsp = 0x8061d0c90, rbp = 0x8061d0dc0 ---

I'm not sure which call to umtxq_unlock() this corresponds to.

There are no changes to the linuxulator in that range of revisions, but your version string is "master-n253798-8cdecdecb43-dirty", so you have some local changes.  Any in the linuxulator?  If not, please try installing gdb and running:

# kgdb -q
(kgdb) list *linux_sys_futex+0x5e1

and show the output.
Comment 3 Masachika ISHIZUKA 2022-03-18 01:39:10 UTC
(In reply to Mark Johnston from comment #2)
Thank you for reply.
I installed master-n253798-8cdecdecb43-dirty for two machines.
Both machines crashed with firefox with the same panic string.
Dirty is due to udf2 filesystems patch.
The below is another machine's output of kgdb.

carrot% su
carrot# kgdb -q
Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...
sched_switch (td=td@entry=0xffffffff81e8cde0 <thread0_st>, 
    flags=<optimized out>, flags@entry=260)
    at /usr/src/sys/kern/sched_ule.c:2245
2245                    cpuid = td->td_oncpu = PCPU_GET(cpuid);
(kgdb) list *linux_sys_futex+0x5e1
0xffffffff824f0bb1 is in linux_futex (/usr/src/sys/sys/umtxvar.h:262).
257     umtxq_unlock(struct umtx_key *key)
258     {
259             struct umtxq_chain *uc;
260
261             uc = umtxq_getchain(key);
262             mtx_unlock(&uc->uc_lock);
263     }
264
265     #endif /* _KERNEL */
266     #endif /* !_SYS_UMTXVAR_H_ */
(kgdb) q
carrot#
Comment 4 Mark Johnston freebsd_committer freebsd_triage 2022-03-18 16:46:01 UTC
(In reply to Masachika ISHIZUKA from comment #3)
Hmm, the inlining of umtxq_unlock() makes this hard to identify.  Can you obtain a kernel dump?  If not, you might try adding something like

  printf("linux futex op %d\n", args->op);

to the beginning of linux_futex(), and reproduce the crash.  That will make it easier to narrow down the code path; there are many uses of umtxq_unlock() in the futex code.
Comment 5 Masachika ISHIZUKA 2022-03-18 22:10:37 UTC
(In reply to Mark Johnston from comment #4)
vmcore was stored to https://www.ish.org/files/vmcore-8cdecdecb43.2.xz .

Now, I'm building world with 'printf("linux futex op %d\n", args->op);' to the beginning of /sys/compat/linux/linux_futex.c.
Comment 6 Mark Johnston freebsd_committer freebsd_triage 2022-03-18 22:17:06 UTC
(In reply to Masachika ISHIZUKA from comment #5)
If there is a backtrace in the core.txt file, please just attach that here.
Comment 7 Masachika ISHIZUKA 2022-03-19 00:14:47 UTC
Created attachment 232567 [details]
core.txt
Comment 8 Masachika ISHIZUKA 2022-03-19 00:43:00 UTC
Created attachment 232568 [details]
Attached core.txt.3 file is output of patched linux_futex.c

carrot% diff -ruN linux_futex.c.org linux_futex.c
--- linux_futex.c.org   2021-08-17 09:38:11.416807000 +0900
+++ linux_futex.c       2022-03-19 07:05:43.100300000 +0900
@@ -197,6 +197,9 @@
        struct linux_pemuldata *pem;
        struct proc *p;
 
+#if 1
+       printf("DEBUG:linux futex op %d\n", args->op);
+#endif
        if (args->op & LINUX_FUTEX_PRIVATE_FLAG) {
                args->flags = 0;
                args->op &= ~LINUX_FUTEX_PRIVATE_FLAG;
Comment 9 Mark Johnston freebsd_committer freebsd_triage 2022-03-21 15:15:09 UTC
Hmm, so we were in linux_futex_wait(), presumably having slept (op 137 or 128 are present in the dmesg, but aren't the last ops, last op 129 is a wakeup).  We locked the chain before sleeping, and upon waking up we find we have the wrong chain locked.  So presumably curthread->td_umtxq.uq_key.hash changed while we were sleeping...
Comment 10 Mark Johnston freebsd_committer freebsd_triage 2022-03-21 15:29:27 UTC
There is an instance of FUTEX_CMP_REQUEUE (op 132).  This operation moves sleeping threads from one umtx queue to another, possibly changing the queue keys in the process.  This is implemented by umtx_requeue(), not used by the native umtx interface in FreeBSD.  So this can certainly lead to the bug, since umtxq_sleep() does not expect the queue hash to change.
Comment 11 Mark Johnston freebsd_committer freebsd_triage 2022-03-21 16:05:35 UTC
Created attachment 232612 [details]
proposed patch

Please give this patch a try.  I only compile-tested it.
Comment 12 Dmitry Chagin freebsd_committer freebsd_triage 2022-03-21 17:19:30 UTC
hi, it seems the problem in the latest mav@ change to umtxq_sleep(), ie,
call to msleep() without PDROP flag.
Comment 13 Dmitry Chagin freebsd_committer freebsd_triage 2022-03-21 17:19:45 UTC
hi, it seems the problem in the latest mav@ change to umtxq_sleep(), ie,
call to msleep() without PDROP flag.
Comment 14 Mark Johnston freebsd_committer freebsd_triage 2022-03-21 17:31:04 UTC
(In reply to Dmitry Chagin from comment #13)
Oops, thanks for the pointer.  Indeed, this looks like a regression from https://cgit.freebsd.org/src/commit/?id=56070dd2e4df70e4de2a10c7ace684ceb8b0f0bd .
Comment 15 Alexander Motin freebsd_committer freebsd_triage 2022-03-21 17:53:01 UTC
Dmitry, I've removed both PDROP and following umtxq_lock(&uq->uq_key).  Is lock changes somehow during the msleep()?  I guessed it was made separate to move umtx_abs_timeout_update() out of the lock.  Am I missing some lock oddity there?
Comment 16 Mark Johnston freebsd_committer freebsd_triage 2022-03-21 18:07:59 UTC
(In reply to Alexander Motin from comment #15)
The lock can change if umtxq_requeue() moves the queue to a different hash chain.    With PDROP, the subsequent umtxq_lock(&uq->uq_key) will lock the new chain.  Right now we are re-locking the old one.
Comment 17 Dmitry Chagin freebsd_committer freebsd_triage 2022-03-21 18:11:16 UTC
Hi, Alexander! Yes, umtxq_requeue() relies on previous behaviour of umtxq_sleep(), where the lock dropped and reacquired, as umtxq_requeue() replaces the lock.
A year ago, we discussed this block with kib@ and decided that it was safe to drop/acuire it.
Comment 18 Alexander Motin freebsd_committer freebsd_triage 2022-03-21 19:08:34 UTC
OK.  I still don't understand how can it be safe to change the lock when there seems to be a window between umtxq_getchain() and mtx_lock() inside the umtxq_lock(), but what would you say about this patch:

--- a/sys/kern/kern_umtx.c
+++ b/sys/kern/kern_umtx.c
@@ -808,8 +808,10 @@ umtxq_sleep(struct umtx_q *uq, const char *wmesg,
                        if (error != 0)
                                break;
                }
-               error = msleep_sbt(uq, &uc->uc_lock, PCATCH, wmesg,
+               error = msleep_sbt(uq, &uc->uc_lock, PCATCH | PDROP, wmesg,
                    sbt, 0, flags);
+               uc = umtxq_getchain(&uq->uq_key);
+               mtx_lock(&uc->uc_lock);
                if (error == EINTR || error == ERESTART)
                        break;
                if (error == EWOULDBLOCK && (flags & C_ABSOLUTE) != 0) {

It seems even before my change code could try on the old lock due to uc never being updated, so I did it this way instead of using umtxq_lock().
Comment 19 commit-hook freebsd_committer freebsd_triage 2022-03-21 23:56:36 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=fd6ca665d206b74970e7c01d06ae06fed71500fc

commit fd6ca665d206b74970e7c01d06ae06fed71500fc
Author:     Alexander Motin <mav@FreeBSD.org>
AuthorDate: 2022-03-21 23:43:39 +0000
Commit:     Alexander Motin <mav@FreeBSD.org>
CommitDate: 2022-03-21 23:55:55 +0000

    Fix umtxq_sleep() regression caused by 56070dd2e4d.

    umtxq_requeue() moves the queue to a different hash chain and different
    lock, so we can't rely on msleep_sbt() reacquiring the same old lock.
    We have to use PDROP and update the queue chain and so lock pointer.

    PR:             262587
    MFC after:      2 weeks

 sys/kern/kern_umtx.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
Comment 20 Masachika ISHIZUKA 2022-03-22 01:19:28 UTC
(In reply to Alexander Motin from comment #18)
Thank you for patch.
This patch applied to master-n253798-8cdecdecb43 solves my problem.
Comment 21 commit-hook freebsd_committer freebsd_triage 2022-11-12 12:42:50 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=62e1bb424584ca6a46e6224801d25b9fc6ca5ef9

commit 62e1bb424584ca6a46e6224801d25b9fc6ca5ef9
Author:     Alexander Motin <mav@FreeBSD.org>
AuthorDate: 2022-03-21 23:43:39 +0000
Commit:     Hans Petter Selasky <hselasky@FreeBSD.org>
CommitDate: 2022-11-12 11:59:27 +0000

    Fix umtxq_sleep() regression caused by 56070dd2e4d.

    umtxq_requeue() moves the queue to a different hash chain and different
    lock, so we can't rely on msleep_sbt() reacquiring the same old lock.
    We have to use PDROP and update the queue chain and so lock pointer.

    PR:             262587

    (cherry picked from commit fd6ca665d206b74970e7c01d06ae06fed71500fc)

 sys/kern/kern_umtx.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)