Created attachment 232481 [details] /var/crash/textdump.tar.last It was working on master-n253404-05c3e8e8714. Today, I updated to master-n253798-8cdecdecb43, it panicked with 'Panic String: lock (sleep mutex) umtxql not locked @ /usr/src/sys/sys/umtxvar.h:262' after starting firefox.
(In reply to Masachika ISHIZUKA from comment #0) > … updated to master-n253798-8cdecdecb43, it panicked … <https://cgit.freebsd.org/src/log/?qt=range&q=8cdecdecb43> 2022-03-16 00:48:53 +0000 If it helps to bisect: I'm kernel panic-free with very heavy use of Firefox on: <https://cgit.freebsd.org/src/log/?qt=range&q=d5ad1713cc3> 2022-03-14 08:38:04 +0000
(In reply to Masachika ISHIZUKA from comment #0) The panic is: panic: lock (sleep mutex) umtxql not locked @ /usr/src/sys/sys/umtxvar.h:262 cpuid = 2 time = 1647416763 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0101153c00 vpanic() at vpanic+0x17f/frame 0xfffffe0101153c50 panic() at panic+0x43/frame 0xfffffe0101153cb0 witness_unlock() at witness_unlock+0x144/frame 0xfffffe0101153cf0 __mtx_unlock_flags() at __mtx_unlock_flags+0x4d/frame 0xfffffe0101153d20 linux_sys_futex() at linux_sys_futex+0x5e1/frame 0xfffffe0101153e00 amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe0101153f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0101153f30 --- syscall (202, Linux ELF64, linux_sys_futex), rip = 0x803a0ba35, rsp = 0x8061d0c90, rbp = 0x8061d0dc0 --- I'm not sure which call to umtxq_unlock() this corresponds to. There are no changes to the linuxulator in that range of revisions, but your version string is "master-n253798-8cdecdecb43-dirty", so you have some local changes. Any in the linuxulator? If not, please try installing gdb and running: # kgdb -q (kgdb) list *linux_sys_futex+0x5e1 and show the output.
(In reply to Mark Johnston from comment #2) Thank you for reply. I installed master-n253798-8cdecdecb43-dirty for two machines. Both machines crashed with firefox with the same panic string. Dirty is due to udf2 filesystems patch. The below is another machine's output of kgdb. carrot% su carrot# kgdb -q Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... sched_switch (td=td@entry=0xffffffff81e8cde0 <thread0_st>, flags=<optimized out>, flags@entry=260) at /usr/src/sys/kern/sched_ule.c:2245 2245 cpuid = td->td_oncpu = PCPU_GET(cpuid); (kgdb) list *linux_sys_futex+0x5e1 0xffffffff824f0bb1 is in linux_futex (/usr/src/sys/sys/umtxvar.h:262). 257 umtxq_unlock(struct umtx_key *key) 258 { 259 struct umtxq_chain *uc; 260 261 uc = umtxq_getchain(key); 262 mtx_unlock(&uc->uc_lock); 263 } 264 265 #endif /* _KERNEL */ 266 #endif /* !_SYS_UMTXVAR_H_ */ (kgdb) q carrot#
(In reply to Masachika ISHIZUKA from comment #3) Hmm, the inlining of umtxq_unlock() makes this hard to identify. Can you obtain a kernel dump? If not, you might try adding something like printf("linux futex op %d\n", args->op); to the beginning of linux_futex(), and reproduce the crash. That will make it easier to narrow down the code path; there are many uses of umtxq_unlock() in the futex code.
(In reply to Mark Johnston from comment #4) vmcore was stored to https://www.ish.org/files/vmcore-8cdecdecb43.2.xz . Now, I'm building world with 'printf("linux futex op %d\n", args->op);' to the beginning of /sys/compat/linux/linux_futex.c.
(In reply to Masachika ISHIZUKA from comment #5) If there is a backtrace in the core.txt file, please just attach that here.
Created attachment 232567 [details] core.txt
Created attachment 232568 [details] Attached core.txt.3 file is output of patched linux_futex.c carrot% diff -ruN linux_futex.c.org linux_futex.c --- linux_futex.c.org 2021-08-17 09:38:11.416807000 +0900 +++ linux_futex.c 2022-03-19 07:05:43.100300000 +0900 @@ -197,6 +197,9 @@ struct linux_pemuldata *pem; struct proc *p; +#if 1 + printf("DEBUG:linux futex op %d\n", args->op); +#endif if (args->op & LINUX_FUTEX_PRIVATE_FLAG) { args->flags = 0; args->op &= ~LINUX_FUTEX_PRIVATE_FLAG;
Hmm, so we were in linux_futex_wait(), presumably having slept (op 137 or 128 are present in the dmesg, but aren't the last ops, last op 129 is a wakeup). We locked the chain before sleeping, and upon waking up we find we have the wrong chain locked. So presumably curthread->td_umtxq.uq_key.hash changed while we were sleeping...
There is an instance of FUTEX_CMP_REQUEUE (op 132). This operation moves sleeping threads from one umtx queue to another, possibly changing the queue keys in the process. This is implemented by umtx_requeue(), not used by the native umtx interface in FreeBSD. So this can certainly lead to the bug, since umtxq_sleep() does not expect the queue hash to change.
Created attachment 232612 [details] proposed patch Please give this patch a try. I only compile-tested it.
hi, it seems the problem in the latest mav@ change to umtxq_sleep(), ie, call to msleep() without PDROP flag.
(In reply to Dmitry Chagin from comment #13) Oops, thanks for the pointer. Indeed, this looks like a regression from https://cgit.freebsd.org/src/commit/?id=56070dd2e4df70e4de2a10c7ace684ceb8b0f0bd .
Dmitry, I've removed both PDROP and following umtxq_lock(&uq->uq_key). Is lock changes somehow during the msleep()? I guessed it was made separate to move umtx_abs_timeout_update() out of the lock. Am I missing some lock oddity there?
(In reply to Alexander Motin from comment #15) The lock can change if umtxq_requeue() moves the queue to a different hash chain. With PDROP, the subsequent umtxq_lock(&uq->uq_key) will lock the new chain. Right now we are re-locking the old one.
Hi, Alexander! Yes, umtxq_requeue() relies on previous behaviour of umtxq_sleep(), where the lock dropped and reacquired, as umtxq_requeue() replaces the lock. A year ago, we discussed this block with kib@ and decided that it was safe to drop/acuire it.
OK. I still don't understand how can it be safe to change the lock when there seems to be a window between umtxq_getchain() and mtx_lock() inside the umtxq_lock(), but what would you say about this patch: --- a/sys/kern/kern_umtx.c +++ b/sys/kern/kern_umtx.c @@ -808,8 +808,10 @@ umtxq_sleep(struct umtx_q *uq, const char *wmesg, if (error != 0) break; } - error = msleep_sbt(uq, &uc->uc_lock, PCATCH, wmesg, + error = msleep_sbt(uq, &uc->uc_lock, PCATCH | PDROP, wmesg, sbt, 0, flags); + uc = umtxq_getchain(&uq->uq_key); + mtx_lock(&uc->uc_lock); if (error == EINTR || error == ERESTART) break; if (error == EWOULDBLOCK && (flags & C_ABSOLUTE) != 0) { It seems even before my change code could try on the old lock due to uc never being updated, so I did it this way instead of using umtxq_lock().
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=fd6ca665d206b74970e7c01d06ae06fed71500fc commit fd6ca665d206b74970e7c01d06ae06fed71500fc Author: Alexander Motin <mav@FreeBSD.org> AuthorDate: 2022-03-21 23:43:39 +0000 Commit: Alexander Motin <mav@FreeBSD.org> CommitDate: 2022-03-21 23:55:55 +0000 Fix umtxq_sleep() regression caused by 56070dd2e4d. umtxq_requeue() moves the queue to a different hash chain and different lock, so we can't rely on msleep_sbt() reacquiring the same old lock. We have to use PDROP and update the queue chain and so lock pointer. PR: 262587 MFC after: 2 weeks sys/kern/kern_umtx.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
(In reply to Alexander Motin from comment #18) Thank you for patch. This patch applied to master-n253798-8cdecdecb43 solves my problem.
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=62e1bb424584ca6a46e6224801d25b9fc6ca5ef9 commit 62e1bb424584ca6a46e6224801d25b9fc6ca5ef9 Author: Alexander Motin <mav@FreeBSD.org> AuthorDate: 2022-03-21 23:43:39 +0000 Commit: Hans Petter Selasky <hselasky@FreeBSD.org> CommitDate: 2022-11-12 11:59:27 +0000 Fix umtxq_sleep() regression caused by 56070dd2e4d. umtxq_requeue() moves the queue to a different hash chain and different lock, so we can't rely on msleep_sbt() reacquiring the same old lock. We have to use PDROP and update the queue chain and so lock pointer. PR: 262587 (cherry picked from commit fd6ca665d206b74970e7c01d06ae06fed71500fc) sys/kern/kern_umtx.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)