Bug 261198 - bhyve host panics with: spin lock 0xffffffff81eac800 (callout) helpanic: spin lock held too long
Summary: bhyve host panics with: spin lock 0xffffffff81eac800 (callout) helpanic: spin...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bhyve (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Mark Johnston
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-01-14 14:08 UTC by Tom Jones
Modified: 2022-02-14 15:14 UTC (History)
4 users (show)

See Also:


Attachments
Panic while rebooting guest (307.40 KB, image/jpeg)
2022-01-14 14:08 UTC, Tom Jones
no flags Details
Panic while running disk test (333.96 KB, image/jpeg)
2022-01-14 14:09 UTC, Tom Jones
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Jones freebsd_committer freebsd_triage 2022-01-14 14:08:31 UTC
Created attachment 231002 [details]
Panic while rebooting guest

My FreeBSD bhyve host running 1 FreeBSD guest (both 14.0-CURRENT) is sporadically panicing with spinlock held too long. I have attached photos of the two panic messages I have seen.

The first panic happened when the guest was rebooting, the second when I was running a disk perf measurement in the vm. 

I will set up better remote access when I am a the machine tomorrow so I can interact with it once it has paniced. 

host: FreeBSD pokeitwithastick 14.0-CURRENT FreeBSD 14.0-CURRENT #8 main-n252356-01cad731921: Tue Jan 11 10:13:27 UTC 2022     tj@pokeitwithastick:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64
Comment 1 Tom Jones freebsd_committer freebsd_triage 2022-01-14 14:09:06 UTC
Created attachment 231003 [details]
Panic while running disk test
Comment 2 Tom Jones freebsd_committer freebsd_triage 2022-01-14 14:11:04 UTC
Host is a AMD Ryzen 7 3700X 8-Core Processor, full dmesg is here:

https://dmesgd.nycbug.org/index.cgi?do=view&id=6354
Comment 3 Vitaliy Gusev 2022-01-14 20:07:43 UTC
Tom, could you find guilty commit from the list below (manually or by usage 'git bisect') ? How regularly did you run your tests last two-thee months?


 $ git l sys/amd64/vmm/
 6171e026be11 bhyve: add support for MTRR
 c72e914cf108 vmm: vlapic resume can eat 100% CPU by vlapic_callout_handler
 e2650af157bc Make CPU_SET macros compliant with other implementations
 4c812fe61b7c vlapic: Schedule callouts on the local CPU
 de8554295b47 cpuset(9): Add CPU_FOREACH_IS(SET|CLR) and modify consumers to use it
 bcdc599dc2a1 Revert "cpuset(9): Add CPU_FOREACH_IS(SET|CLR) and modify consumers to use it"
 9068f6ea697b cpuset(9): Add CPU_FOREACH_IS(SET|CLR) and modify consumers to use it
Comment 4 Tom Jones freebsd_committer freebsd_triage 2022-01-16 10:38:08 UTC
Yeah can do, but not until wednesday the com cable I have was 2mm and the headers on the motherboard are 2.54mm.
Comment 5 Mark Johnston freebsd_committer freebsd_triage 2022-02-05 17:30:14 UTC
I hit what appears to be the same thing.  The system was running a bunch of CPU-intensive FreeBSD guests and a buildworld on the host, somewhat oversubscribed.

One of the per-CPU callout spinlocks was held for too long:

spin lock 0xffffffff81ead8c0 (callout) held by 0xfffffe00806e21e0 (tid 100203) too long
panic: spin lock held too long                                                                                                                                
cpuid = 17                                                                                                                                                    
time = 1644078265                                                                                                                                             
KDB: stack backtrace:                                                                                                                                         
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01c03184e0                                                                                
vpanic() at vpanic+0x17f/frame 0xfffffe01c0318530                                                                                                             
panic() at panic+0x43/frame 0xfffffe01c0318590                                                                                                                
_mtx_lock_indefinite_check() at _mtx_lock_indefinite_check+0x74/frame 0xfffffe01c03185b0                                                                      
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xd5/frame 0xfffffe01c0318620                                                                                
__mtx_lock_spin_flags() at __mtx_lock_spin_flags+0xd5/frame 0xfffffe01c0318660                                                                                
_callout_stop_safe() at _callout_stop_safe+0x145/frame 0xfffffe01c03186d0                                                                                     
sleepq_remove_thread() at sleepq_remove_thread+0x1ad/frame 0xfffffe01c03186f0                                                                                 
sleepq_resume_thread() at sleepq_resume_thread+0xaa/frame 0xfffffe01c0318730                                                                                  
wakeup_one() at wakeup_one+0x1f/frame 0xfffffe01c0318750                                                                                                      
vcpu_notify_event() at vcpu_notify_event+0x4e/frame 0xfffffe01c0318780

If I look at the mutex state, though, the lock is unheld:

(kgdb) frame 5
#5  0xffffffff80be46d4 in _mtx_lock_indefinite_check (m=m@entry=0xffffffff81ead8c0 <cc_cpu+8000>, ldap=ldap@entry=0xfffffe01c03185e0)
    at /usr/home/markj/src/freebsd/sys/kern/kern_mutex.c:1256
1256                    panic("spin lock held too long");
(kgdb) p td
$1 = (struct thread *) 0xfffffe00806e21e0
(kgdb) p m
$2 = (struct mtx *) 0xffffffff81ead8c0 <cc_cpu+8000>
(kgdb) p *m
$3 = {lock_object = {lo_name = 0xffffffff8130481b "callout", lo_flags = 196608, lo_data = 0, lo_witness = 0x0}, mtx_lock = 0}
(kgdb) p td->td_tid
$4 = 100203
(kgdb) tid 100203
(kgdb) bt
#0  cpustop_handler () at /usr/home/markj/src/freebsd/sys/x86/x86/mp_x86.c:1490
#1  0xffffffff810ab739 in ipi_nmi_handler () at /usr/home/markj/src/freebsd/sys/x86/x86/mp_x86.c:1447
#2  0xffffffff810e4dd6 in trap (frame=0xfffffe006baf5f30) at /usr/home/markj/src/freebsd/sys/amd64/amd64/trap.c:240
#3  <signal handler called>
#4  rdtsc () at /usr/home/markj/src/freebsd/sys/amd64/include/cpufunc.h:355
#5  delay_tsc (n=1) at /usr/home/markj/src/freebsd/sys/x86/x86/delay.c:68
#6  cpu_lock_delay () at /usr/home/markj/src/freebsd/sys/x86/x86/delay.c:140
#7  0xffffffff80be46d9 in _mtx_lock_indefinite_check (m=0x22000ee4, m@entry=0xfffffe006baf70c0, ldap=0x9927a00000000, ldap@entry=0xffffffff82d07e70)
    at /usr/home/markj/src/freebsd/sys/kern/kern_mutex.c:1243
#8  0xffffffff80be490a in thread_lock_flags_ (td=0xfffffe00806e21e0, opts=0, file=<optimized out>, line=<optimized out>)
    at /usr/home/markj/src/freebsd/sys/kern/kern_mutex.c:929
#9  0xffffffff80c27226 in softclock_thread (arg=arg@entry=0xffffffff81ead8c0 <cc_cpu+8000>) at /usr/home/markj/src/freebsd/sys/kern/kern_timeout.c:840
#10 0xffffffff80bbfc40 in fork_exit (callout=0xffffffff80c271e0 <softclock_thread>, arg=0xffffffff81ead8c0 <cc_cpu+8000>, frame=0xffffffff82d07f40)
    at /usr/home/markj/src/freebsd/sys/kern/kern_fork.c:1102
#11 <signal handler called>
#12 0x35f3ad8ceff1b8b6 in ?? ()

So this softclock thread is spinning for its own thread lock, which is:

(kgdb) p/x *m@entry
$5 = {lock_object = {lo_name = 0xfffffe006baf7d40, lo_flags = 0x30000, lo_data = 0x0, lo_witness = 0x0}, mtx_lock = 0xfffffe0237c51c80}

and the owning thread is the thread that panicked:

(kgdb) p ((struct thread *)$5.mtx_lock)->td_tid
$6 = 161889

Hmm.  I somewhat doubt this is bhyve specific, bhyve just exercises the callout subsystem heavily.  This is probably a result of commit 74cf7cae4d2238ae6d1c949b2bbd077e1ab33634 somehow.
Comment 6 Mark Johnston freebsd_committer freebsd_triage 2022-02-08 02:23:55 UTC
Looks like there is a lock order reversal: sleepq_signal() selects a thread on the sleepqueue, locks it, and then sleepq_remove_thread() stops the td_slpcallout callout, which requires a per-CPU callout lock.  softclock_thread() locks the per-CPU callout lock and then acquires the current thread's lock.

Usually the thread selected by sleepq_signal() will have its lock set to that of the sleepqueue, and no deadlock is possible.  But if the thread is preparing to go to sleep, its lock will still be a scheduler lock.  So it's not very easy to hit this deadlock.
Comment 7 Mark Johnston freebsd_committer freebsd_triage 2022-02-08 02:41:42 UTC
Hmm, I wonder if we can avoid stopping the callout when td->td_lock != sc->sc_lock.  In that case, the thread is in the process of going to sleep and will abort in sleepq_switch() anyway.  I don't think sleepq_switch() can safely abort the callout, but I believe it should be ok to leave it be.  That'd be a rare case, and see the comment above the problematic callout_stop() call.
Comment 8 Mark Johnston freebsd_committer freebsd_triage 2022-02-08 16:30:36 UTC
https://reviews.freebsd.org/D34204
Comment 9 commit-hook freebsd_committer freebsd_triage 2022-02-14 15:08:03 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=893be9d8ac161c4cc96e9f3f12f1260355dd123b

commit 893be9d8ac161c4cc96e9f3f12f1260355dd123b
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2022-02-14 14:38:53 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2022-02-14 15:06:47 +0000

    sleepqueue: Address a lock order reversal

    After commit 74cf7cae4d22 ("softclock: Use dedicated ithreads for
    running callouts."), there is a lock order reversal between the per-CPU
    callout lock and the scheduler lock.  softclock_thread() locks callout
    lock then the scheduler lock, when preparing to switch off-CPU, and
    sleepq_remove_thread() stops the timed sleep callout while potentially
    holding a scheduler lock.  In the latter case, it's the thread itself
    that's locked, and if the thread is sleeping then its lock will be a
    sleepqueue lock, but if it's still in the process of going to sleep
    it'll be a scheduler lock.

    We could perhaps change softclock_thread() to try to acquire locks in
    the opposite order, but that'd require dropping and re-acquiring the
    callout lock, which seems expensive for an operation that will happen
    quite frequently.  We can instead perhaps avoid stopping the
    td_slpcallout callout if the thread is still going to sleep, which is
    what this patch does.  This will result in a spurious call to
    sleepq_timeout(), but some counters suggest that this is very rare.

    PR:             261198
    Fixes:          74cf7cae4d22 ("softclock: Use dedicated ithreads for running callouts.")
    Reported and tested by: thj
    Reviewed by:    kib
    Sponsored by:   The FreeBSD Foundation
    Differential Revision:  https://reviews.freebsd.org/D34204

 sys/kern/subr_sleepqueue.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)