Summary: | Kernel panic due to spin lock held too long | ||
---|---|---|---|
Product: | Base System | Reporter: | Adam Jimerson <vendion> |
Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> |
Status: | Open --- | ||
Severity: | Affects Only Me | CC: | douglas, emaste, jah, markj |
Priority: | --- | Keywords: | crash, needs-qa |
Version: | 12.1-RELEASE | ||
Hardware: | amd64 | ||
OS: | Any |
Description
Adam Jimerson
2020-03-18 01:05:55 UTC
Thank you for the report Adam Is there a panic summary and backtrace provided on the console at the time of the crash? The best I can find is the following from /var/log/messages Limiting closed port RST response from 246 to 200 packets/sec spin lock 0xffffffff8200cc50 (smp rendezvous) held by 0xfffff8024555e000 (tid 101747) too long timeout stopping cpus panic: spin lock held too long cpuid = 0 time = 1584487103 KDB: stack backtrace: #0 0xffffffff80c1d2b7 at kdb_backtrace+0x67 #1 0xffffffff80bd05ed at vpanic+0x19d #2 0xffffffff80bd0443 at panic+0x43 #3 0xffffffff80bb09ad at _mtx_lock_indefinite_check+0x6d #4 0xffffffff80bb04f5 at _mtx_lock_spin_cookie+0xd5 #5 0xffffffff80c2d539 at smp_rendezvous_cpus+0x239 #6 0xffffffff80bcb071 at _rm_wlock+0x171 #7 0xffffffff80cf8d48 at rt_foreach_fib_walk+0xa8 #8 0xffffffff80e21a37 at in6_mtutimo+0x47 #9 0xffffffff80bea743 at softclock_call_cc+0x143 #10 0xffffffff80beac09 at softclock+0x79 #11 0xffffffff80b93dd4 at ithread_loop+0x1d4 #12 0xffffffff80b90c43 at fork_exit+0x83 #13 0xffffffff81082a6e at fork_trampoline+0xe Uptime: 7h42m4s Everything before that looks normal to me. Not a lot to go on from the backtrace. One thing that can cause this panic is if a thread calls smp_rendezvous_cpus() with local interrupts disabled: if another thread holds smp_ipi_mtx and is trying to IPI the first thread's CPU, the two threads can livelock. It doesn't look like the timer callout+in6 route expiration path that panicked should ever call smp_rendezvous_cpus() with interrupts disabled, but I think it's also possible that this thread could have been the victim of two other threads that were livelocked. r357308 added a check to HEAD (INVARIANTS kernels only) that panics if a thread enters smp_rendezvous_cpus() with interrupts disabled. If this is at all reproducible, it might be worth trying a kernel with that check backported. Also worth noting: in6_mtutimo() was completely removed from HEAD in 355888. Is the kernel (12.1-p2) still available? Could you post it, so the core can be inspected? I don't have a 12.1-p2 kernel anymore, my current kernel is 12.1-p6 and I have not had any spin locks since reporting this issue. The core.txt and vmcore.o files from when the crash happened are still avaliable though if those will help (see the Google Drive link in my original comment here for the link). (In reply to Adam Jimerson from comment #5) Indeed, we need a copy of the kernel in order to open the vmcore. It doesn't seem to be straightforward to get a copy of the 12.1p2 kernel once additional patches have been released, which is itself a problem but also means that we can't easily debug the crash. I will close the bug for now since the crash doesn't appear reproducible. Please re-open if it happens again. Leaving this closed for now as for some reason the crash dumps either did not get exported/saved so don't know how useful this is but I did see that this happened again yesterday and all I have to go off of is the stacktrace from /var/log/messages: Jun 26 21:38:42 Heimdall syslogd: kernel boot file is /boot/kernel/kernel Jun 26 21:38:42 Heimdall kernel: spin lock 0xffffffff8200cc50 (smp rendezvous) held by 0xfffff80214eb55e0 (tid 101663) too long Jun 26 21:38:42 Heimdall kernel: timeout stopping cpus Jun 26 21:38:42 Heimdall kernel: panic: spin lock held too long Jun 26 21:38:42 Heimdall kernel: cpuid = 0 Jun 26 21:38:42 Heimdall kernel: time = 1593001188 Jun 26 21:38:42 Heimdall kernel: KDB: stack backtrace: Jun 26 21:38:42 Heimdall kernel: #0 0xffffffff80c1d307 at kdb_backtrace+0x67 Jun 26 21:38:42 Heimdall kernel: #1 0xffffffff80bd063d at vpanic+0x19d Jun 26 21:38:42 Heimdall kernel: #2 0xffffffff80bd0493 at panic+0x43 Jun 26 21:38:42 Heimdall kernel: #3 0xffffffff80bb09fd at _mtx_lock_indefinite_check+0x6d Jun 26 21:38:42 Heimdall kernel: #4 0xffffffff80bb0545 at _mtx_lock_spin_cookie+0xd5 Jun 26 21:38:42 Heimdall kernel: #5 0xffffffff8121fb0e at smp_targeted_tlb_shootdown+0x3ce Jun 26 21:38:42 Heimdall kernel: #6 0xffffffff8109c2af at pmap_advise+0x57f Jun 26 21:38:42 Heimdall kernel: #7 0xffffffff80f16214 at vm_map_madvise+0x354 Jun 26 21:38:42 Heimdall kernel: #8 0xffffffff80f1c259 at sys_madvise+0x79 Jun 26 21:38:42 Heimdall kernel: #9 0xffffffff810a8984 at amd64_syscall+0x364 Jun 26 21:38:42 Heimdall kernel: #10 0xffffffff81082520 at fast_syscall_common+0x101 Jun 23 21:42:25 Heimdall pulseaudio[46631]: [(null)] oss-util.c: '/dev/dsp2' doesn't support full duplex Jun 23 21:42:25 Heimdall pulseaudio[46631]: [(null)] oss-util.c: '/dev/dsp3' doesn't support full duplex Jun 23 21:42:25 Heimdall pulseaudio[46631]: [(null)] oss-util.c: '/dev/dsp4' doesn't support full duplex Jun 23 21:42:25 Heimdall pulseaudio[46631]: [(null)] oss-util.c: '/dev/dsp5' doesn't support full duplex Jun 23 21:42:25 Heimdall pulseaudio[46631]: [(null)] oss-util.c: '/dev/dsp8' doesn't support full duplex Jun 26 21:38:42 Heimdall syslogd: kernel boot file is /boot/kernel/kernel Jun 26 21:38:42 Heimdall kernel: spin lock 0xffffffff8200cc50 (smp rendezvous) held by 0xfffff80214eb55e0 (tid 101663) too long Jun 26 21:38:42 Heimdall kernel: timeout stopping cpus Jun 26 21:38:42 Heimdall kernel: panic: spin lock held too long Jun 26 21:38:42 Heimdall kernel: cpuid = 0 Jun 26 21:38:42 Heimdall kernel: time = 1593001188 Jun 26 21:38:42 Heimdall kernel: KDB: stack backtrace: Jun 26 21:38:42 Heimdall kernel: #0 0xffffffff80c1d307 at kdb_backtrace+0x67 Jun 26 21:38:42 Heimdall kernel: #1 0xffffffff80bd063d at vpanic+0x19d Jun 26 21:38:42 Heimdall kernel: #2 0xffffffff80bd0493 at panic+0x43 Jun 26 21:38:42 Heimdall kernel: #3 0xffffffff80bb09fd at _mtx_lock_indefinite_check+0x6d Jun 26 21:38:42 Heimdall kernel: #4 0xffffffff80bb0545 at _mtx_lock_spin_cookie+0xd5 Jun 26 21:38:42 Heimdall kernel: #5 0xffffffff8121fb0e at smp_targeted_tlb_shootdown+0x3ce Jun 26 21:38:42 Heimdall kernel: #6 0xffffffff8109c2af at pmap_advise+0x57f Jun 26 21:38:42 Heimdall kernel: #7 0xffffffff80f16214 at vm_map_madvise+0x354 Jun 26 21:38:42 Heimdall kernel: #8 0xffffffff80f1c259 at sys_madvise+0x79 Jun 26 21:38:42 Heimdall kernel: #9 0xffffffff810a8984 at amd64_syscall+0x364 Jun 26 21:38:42 Heimdall kernel: #10 0xffffffff81082520 at fast_syscall_common+0x101 I can upload the current kernel I'm using (FreeBSD 12.1-p6 GENERIC) if need be, but if it happens again and I get crash dumps for it I'll reopen. Okay so this morning I had another spin lock, and it was able to create the crash dump. The stacktrace info from /var/log/messages, the crash dumps from /var/crash and my current kernel are all available here https://drive.google.com/file/d/1LtXPg3XarwWxAwTZuzD3ug6oY_QGJUT9/view?usp=sharing What is your hardware? (In reply to Andriy Gapon from comment #9) My hardware info is available at https://bsd-hardware.info/?probe=b955e8c3bc that seems to list all my hardware including attached USB devices. I would like to add that this is still happening periodically with 12.2-RELEASE-p2 GENERIC although the latest core dump did not get saved as it was too large for the swap space I have available on my system. (In reply to Adam Jimerson from comment #11) Could you try enabling compression on the core dump? Add -Z to the dumpon(8) invocation, e.g., by adding dumpon_flags="-Z" to /etc/rc.conf. (In reply to Mark Johnston from comment #12) I certainly can enable that and see if that helps. Okay so this happened again last night, I have the contents of /var/crash /boot and the dmesg output all avalable here https://drive.google.com/file/d/1524VJIumcrcGKQNPXmtP5_4g4B0MjHf9/view?usp=sharing (In reply to Adam Jimerson from comment #14) Thanks. Unfortunately I need a copy of /usr/lib/debug/boot/kernel as well. Could you provide a copy? (In reply to Mark Johnston from comment #15) Certanly, here is the contents of that directory https://drive.google.com/file/d/1WyqHAXzOsF9-EELo9f9ovr5Uc1KvQ3GV/view?usp=sharing |