Bug 236917 - emulators/virtualbox-ose: Sleeping thread owns a non-sleepable lock kernel panic
Summary: emulators/virtualbox-ose: Sleeping thread owns a non-sleepable lock kernel panic
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Virtualbox Team (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-03-31 19:42 UTC by Derek Schrock
Modified: 2021-06-06 20:43 UTC (History)
4 users (show)

See Also:
bugzilla: maintainer-feedback? (vbox)


Attachments
Three set of panic backtraces (5.86 KB, text/plain)
2019-04-13 23:35 UTC, Derek Schrock
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Derek Schrock 2019-03-31 19:42:59 UTC
FreeBSD 12.0-RELEASE-p3 amd64 host running multiple guests (FreeBSD, RHEL, Fedora, Windows) I'll get the follow kernel panic (see below).  It seems to be popular during startups of the hosts however I've seen it on long running states as well.  I've had this kernel panic this host 4 time in the past month.  I don't have any reproducers.

The vbox home directory and data is hosted on a ZFS mirror.

 kernel: Sleeping thread (tid 100909, pid 1118) owns a non-sleepable lock
 kernel: KDB: stack backtrace of thread 100909:
 kernel: #0 0xffffffff80ba6de1 at mi_switch+0xe1
 kernel: #1 0xffffffff80bf5c45 at sleepq_catch_signals+0x405
 kernel: #2 0xffffffff80bf5f24 at sleepq_timedwait_sig+0x14
 kernel: #3 0xffffffff82669ba6 at rtR0SemEventMultiBsdWait+0x2a6
 kernel: #4 0xffffffff82a9784a at __stop_set_sysinit_set+0x9e02
 kernel: #5 0xffffffff8263fa61 at supdrvIOCtlInnerUnrestricted+0x1781
 kernel: #6 0xffffffff8264d3fd at VBoxDrvFreeBSDIOCtl+0xed
 kernel: #7 0xffffffff80a51acd at devfs_ioctl+0xad
 kernel: #8 0xffffffff811f945e at VOP_IOCTL_APV+0x7e
 kernel: #9 0xffffffff80c7a144 at vn_ioctl+0x1a4
 kernel: #10 0xffffffff80a520af at devfs_ioctl_f+0x1f
 kernel: #11 0xffffffff80c04f3d at kern_ioctl+0x26d
 kernel: #12 0xffffffff80c04c5e at sys_ioctl+0x15e
 kernel: #13 0xffffffff81075449 at amd64_syscall+0x369
 kernel: #14 0xffffffff8104fd1d at fast_syscall_common+0x101
 kernel: panic: sleeping thread
 kernel: cpuid = 2
 kernel: time = 1554059485
 kernel: KDB: stack backtrace:
 kernel: #0 0xffffffff80be7977 at kdb_backtrace+0x67
 kernel: #1 0xffffffff80b9b563 at vpanic+0x1a3
 kernel: #2 0xffffffff80b9b3b3 at panic+0x43
 kernel: #3 0xffffffff80bfd412 at propagate_priority+0x2b2
 kernel: #4 0xffffffff80bfe048 at turnstile_wait+0x338
 kernel: #5 0xffffffff80b7ade6 at __mtx_lock_sleep+0x196
 kernel: #6 0xffffffff80f0071e at vm_reserv_alloc_page+0x4ee
 kernel: #7 0xffffffff80ef2c7e at vm_page_alloc_domain_after+0xde
 kernel: #8 0xffffffff80ef2aa4 at vm_page_alloc+0x74
 kernel: #9 0xffffffff80edbd3c at vm_fault_hold+0x13ac
 kernel: #10 0xffffffff80eda940 at vm_fault+0x60
 kernel: #11 0xffffffff81074ae3 at trap_pfault+0x163
 kernel: #12 0xffffffff81074175 at trap+0x425
 kernel: #13 0xffffffff8104f435 at calltrap+0x8
Comment 1 Derek Schrock 2019-04-13 23:35:20 UTC
Created attachment 203654 [details]
Three set of panic backtraces

After upgrading to virtualbox-ose-5.2.26_4 today three panics in a row with mildly different backtraces.  Disabling vboxheadless and starting vboxheadless by hand did not panic the machine.

All panics were "Sleeping thread (tid X, pid Y) owns a non-sleepable lock"
All referencing VBoxDrvFreeBSDIOCtl and rtR0SemEventWait

Should this be reported to upsteam?
Comment 2 martin ilavsky 2020-10-22 19:39:50 UTC
Today I was doing update within FreeBSD amd64 12.1 using svn, from r354886 to r366929. Custom kernel is basically DEFAULT + PF and watchdog. 
I did update the virtualbox packages too, to these versions:

virtualbox-ose-5.2.44              <   needs updating (index has 5.2.44_4)
virtualbox-ose-kmod-5.2.44         <   needs updating (index has 5.2.44_3)

I hit the issue when VMs got started.

Unread portion of the kernel message buffer:
Sleeping thread (tid 101652, pid 1456) owns a non-sleepable lock
KDB: stack backtrace of thread 101652:
#0 0xffffffff80be2062 at mi_switch+0xe2
#1 0xffffffff80c31c35 at sleepq_catch_signals+0x425
#2 0xffffffff80c31f24 at sleepq_timedwait_sig+0x14
#3 0xffffffff82c4c726 at rtR0SemEventMultiBsdWait+0x2a6
#4 0xffffffff82c9091b at __stop_set_sysinit_set+0x9ed3
#5 0xffffffff82c22a9c at supdrvIOCtlInnerUnrestricted+0x17bc
#6 0xffffffff82c3046d at VBoxDrvFreeBSDIOCtl+0xed
#7 0xffffffff80a8bf8d at devfs_ioctl+0xad
#8 0xffffffff8127b6fc at VOP_IOCTL_APV+0x7c
#9 0xffffffff80cb6b5a at vn_ioctl+0x16a
#10 0xffffffff80a8c57f at devfs_ioctl_f+0x1f
#11 0xffffffff80c4167e at kern_ioctl+0x2be
#12 0xffffffff80c4134d at sys_ioctl+0x15d
#13 0xffffffff810f0984 at amd64_syscall+0x364
#14 0xffffffff810c9c50 at fast_syscall_common+0x101
panic: sleeping thread
cpuid = 3
time = 1603373869
KDB: stack backtrace:
#0 0xffffffff80c233b7 at kdb_backtrace+0x67
#1 0xffffffff80bd66ed at vpanic+0x19d
#2 0xffffffff80bd6543 at panic+0x43
#3 0xffffffff80c396e1 at propagate_priority+0x291
#4 0xffffffff80c3a34b at turnstile_wait+0x32b
#5 0xffffffff80bb6121 at __mtx_lock_sleep+0x191
#6 0xffffffff80f7ad80 at vm_reserv_reclaim_inactive+0x90
#7 0xffffffff80f6c210 at vm_page_alloc_domain_after+0x1d0
#8 0xffffffff810f1621 at uma_small_alloc+0x61
#9 0xffffffff80f4ef0d at keg_alloc_slab+0xdd
#10 0xffffffff80f51c9d at keg_fetch_slab+0x17d
#11 0xffffffff80f51461 at zone_fetch_slab+0x61
#12 0xffffffff80f51531 at zone_import+0x51
#13 0xffffffff80f4d5a8 at uma_zalloc_arg+0x408
#14 0xffffffff827929a2 at abd_alloc+0x112
#15 0xffffffff82798319 at arc_hdr_alloc_pabd+0x99
#16 0xffffffff82795554 at arc_hdr_alloc+0x124
#17 0xffffffff827951f9 at arc_alloc_buf+0x29
Uptime: 2m35s

Thread in question:
101652 (PID=1456: VBoxHeadless)              0xffffffff80c07f9a in sched_switch ()

I was able to reproduce this every time VMs got booted.

I had to boot back to previous environment, issue is not there. I wonder -- is this a problem of the locking mechanism introduced newer FreeBSD version or in the logic of the VirtualBox kernel module?
Comment 3 Derek Schrock 2020-10-25 00:18:37 UTC
I didn't think of it at the time however during 5.2.44_3 -> 5.2.44_4 (Oct 14th) post reboot and first VM start (Windows 10 VM).  However, post kernel panic everything started fine.  I've had multiple updates between this bug report and pre-5.2.44_4 that didn't result in a panic during init. VM startup in previous upgrades.  Hard to tell where the problem is at.


Oct 14 09:26:32 <kern.crit> ircbsd kernel: #0 0xffffffff80bdbeb2 at mi_switch+0xe2
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #1 0xffffffff80c2ba85 at sleepq_catch_signals+0x425
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #2 0xffffffff80c2bd74 at sleepq_timedwait_sig+0x14
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #3 0xffffffff8267d726 at rtR0SemEventMultiBsdWait+0x2a6
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #4 0xffffffff82c5f91b at __stop_set_sysinit_set+0x9ed3
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #5 0xffffffff82653a9c at supdrvIOCtlInnerUnrestricted+0x17bc
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #6 0xffffffff8266146d at VBoxDrvFreeBSDIOCtl+0xed
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #7 0xffffffff80a85dbd at devfs_ioctl+0xad
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #8 0xffffffff8122b54c at VOP_IOCTL_APV+0x7c
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #9 0xffffffff80cb09aa at vn_ioctl+0x16a
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #10 0xffffffff80a863af at devfs_ioctl_f+0x1f
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #11 0xffffffff80c3b4ce at kern_ioctl+0x2be
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #12 0xffffffff80c3b19d at sys_ioctl+0x15d
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #13 0xffffffff810a9984 at amd64_syscall+0x364
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #14 0xffffffff81082b40 at fast_syscall_common+0x101
Oct 14 09:26:32 <kern.crit> ircbsd kernel: panic: sleeping thread
Oct 14 09:26:32 <kern.crit> ircbsd kernel: cpuid = 6
Oct 14 09:26:32 <kern.crit> ircbsd kernel: time = 1602681916
Oct 14 09:26:32 <kern.crit> ircbsd kernel: KDB: stack backtrace:
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #0 0xffffffff80c1d207 at kdb_backtrace+0x67
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #1 0xffffffff80bd053d at vpanic+0x19d
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #2 0xffffffff80bd0393 at panic+0x43
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #3 0xffffffff80c33531 at propagate_priority+0x291
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #4 0xffffffff80c3419b at turnstile_wait+0x32b
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #5 0xffffffff80baff71 at __mtx_lock_sleep+0x191
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #6 0xffffffff80f33c70 at vm_reserv_reclaim_inactive+0x90
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #7 0xffffffff80f25100 at vm_page_alloc_domain_after+0x1d0
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #8 0xffffffff80f24e14 at vm_page_alloc+0x74
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #9 0xffffffff80f0c1e1 at vm_fault_hold+0xb41
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #10 0xffffffff80f0b650 at vm_fault+0x60
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #11 0xffffffff810a8f34 at trap_pfault+0x164
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #12 0xffffffff810a859b at trap+0x42b
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #13 0xffffffff8108221c at calltrap+0x8
Comment 4 Graham Perrin 2021-06-02 00:31:50 UTC
Bug 254412 is: 

> emulators/virtualbox-ose-additions - Boot time crash - 
> Sleeping thread owns a non-sleepable lock

– with: 

* a backtrace from opening poster David R. Bergstein at bug 254412 comment 0, 
  I guess that guest additions were 6.1.18 at the time (2021-03-19, around 
  ten days after the first release of 6.1.18)

* a backtrace from me at bug 254412 comment 6 (guest additions 6.1.22, 
  Windows 10 21H1 host)

* observations on reproducibility and, hopefully, a workaround at 
  bug 254412 comment 7.

Whilst limiting the guest to a single processor is less than ideal, I hope that the workaround can be as effective for other users as it is (so far) for me. 

----

A little of what follows may be of interest.  


linprocfs bug 99094 (2006-06-18) – panic: sleeping thread (Sleeping thread ... owns a non-sleepable lock). Analysis by John Baldwin at bug 99094 comment 5. 


Panic with vboxnet drivers (2009-09-22) 
<https://lists.freebsd.org/archives/freebsd-emulation/2009-September/006917.html>

> …
> Sleeping thread (tid 100048, pid 48) owns a non-sleepable lock
> …


Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write (2010-05-17)
<https://lists.freebsd.org/pipermail/freebsd-stable/2010-May/056826.html>


Bug 185043 (2013-12-20) – Kernel panic: Sleeping thread (tid , pid ) owns a non-sleepable lock from netinet/in_multi.c


FreeNAS (bug) 9126: System reboot when creating VirtualBox jail (2015-04-09)
<https://redmine.ixsystems.com/issues/9126>

> Closed: Third party to resolve

<https://redmine.ixsystems.com/issues/9126#note-9>: 

> …
> Sleeping thread (tid 100596, pid 6762) owns a non-sleepable lock
> …


Remove the mtx_sleep() from the kqueue f_event filter. (2015-08-03)
<https://cgit.freebsd.org/src/commit/?id=9224217213dbf5ffc74bc54b83aff584304f0677>

> This fix the "Sleeping thread owns a non-sleepable lock" panic when the
> userland thread is too busy reading the packets from bpf(4).



Boot time crash - Sleeping thread owns a non-sleepable lock (2021-03-19)
<https://lists.freebsd.org/archives/freebsd-current/2021-March/168819.html>

> This needs to be fixed in the virtualbox kernel module sources.
> 
> --HPS
Comment 5 Graham Perrin 2021-06-04 06:14:25 UTC
(In reply to Graham Perrin from comment #4)

> * a backtrace from me at bug 254412 comment 6 (guest additions 6.1.22, 
>   Windows 10 21H1 host)

Reproducible with a different host: 

% date ; uname -v ; uname -KU
Fri  4 Jun 2021 06:44:38 BST
FreeBSD 14.0-CURRENT #96 main-n247020-e0fa04e257c: Sat May 29 19:39:23 BST 2021     root@mowa219-gjp4-8570p:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG 
1400017 1400017
% 

Guest additions 6.1.22 on slightly outdated FreeBSD 13.0-RELEASE. 

Screen recording: <https://photos.app.goo.gl/u1DLWduqu6CmbBJS6>

Apologies for the length of the recording – more than seven minutes. 

============================================================================

Whilst the recording captures multiple kernel panics – in response to 

kldload vboxguest

– the majority of what's seen may indicate _coincidental_ problems with UFS: 

* please treat the UFS aspect as off-topic

– there's discussion elsewhere, but not yet a bug report. 

Thank you.
Comment 6 John Baldwin freebsd_committer freebsd_triage 2021-06-04 16:17:49 UTC
Note that the "sleeping thread owns a lock" panics are not all the same (just as not all page fault panics are the same).  What makes them unique is the stack trace of the sleeping thread which is the actual bug.  The thread that panics later is not the broken one, it's just noticing the breakage.

In the case of https://bz-attachments.freebsd.org/attachment.cgi?id=225420 the breakage is somewhere in this code:

Sleeping thread (tid 100099, pid 155) owns a non-sleepable lock
KDB: stack backtrace of thread 100099:
#0 0xffffffff80c166f1 at mi_switch+0xc1
#1 0xffffffff8233cf37 at rtR0SemEventMultiBsdWait+0x297
#2 0xffffffff8231d36a at vgdrvHgcmAsyncWaitCallbackWorker+0x14a
#3 0xffffffff8231e49b at VbglR0HGCMInternalConnect+0x11b
#4 0xffffffff8231ad33 at VGDrvCommonIoCtl+0xb53
#5 0xffffffff82319af6 at VGDrvCommonProcessOptionsFromHost+0x146
#6 0xffffffff8231d9f8 at vgdrvFreeBSDAttach+0x1d8
#7 0xffffffff80c4670d at device_attach+0x3dd

I'm guessing that is part of the guest additions driver for virtual box?

The broken code in the traces in this bug report are all in a different place:

 kernel: #3 0xffffffff82669ba6 at rtR0SemEventMultiBsdWait+0x2a6
 kernel: #4 0xffffffff82a9784a at __stop_set_sysinit_set+0x9e02
 kernel: #5 0xffffffff8263fa61 at supdrvIOCtlInnerUnrestricted+0x1781
 kernel: #6 0xffffffff8264d3fd at VBoxDrvFreeBSDIOCtl+0xed
 kernel: #7 0xffffffff80a51acd at devfs_ioctl+0xad

My rough guess is that the virtual box code is written for Linux and that the wrong set of primitives have been used in the port to FreeBSD.  Namely, what Linux calls a spin lock should become a MTX_DEF mutex in FreeBSD, and what Linux calls a mutex, should become an "sx" lock in FreeBSD.  I suspect that getting the latter case wrong (using MTX_DEF mutex in place of Linux mutexes) is probably what has caused both of these issues in the virtual box guest drivers.
Comment 7 Graham Perrin 2021-06-05 16:34:05 UTC
Afterthought: strictly speaking, this bug is probably with 

emulators/virtualbox-ose-additions

– not emulators/virtualbox-ose

Derek, if you agree, would you like to edit the summary line above? Thanks.
Comment 8 Derek Schrock 2021-06-06 01:24:37 UTC
(In reply to Graham Perrin from comment #7)

I don't believe it's virtualbox-ose-additions since I've seen it crash during Windows 10 guest start up.   FreeBSD ports virtualbox-ose-additions isn't in play there.
Comment 9 Graham Perrin 2021-06-06 04:18:06 UTC
Thank you, and my sincere apologies …

(In reply to John Baldwin from comment #6)

> … the case of <https://bz-attachments.freebsd.org/attachment.cgi?id=225420> …

– emulators/virtualbox-ose-additions (non-legacy) currently at 6.1.22

> … The broken code in the traces in this bug report are all in a 
> different place: …

> … My rough guess is that the virtual box code is written for Linux …

(In reply to Derek Schrock from comment #8)

> … I've seen it crash during Windows 10 guest start up. 

Recent days, my head has been so full (sometimes to overflowing) with kernel panics in FreeBSD guests, I failed to realise the distinction here. Even the summary line _absence of- '-additions' wasn't enough for me to get that this bug 236917 is for panics in FreeBSD _hosts_. 

The first line of your comment 0 was clear. 

I'm embarrassed!
Comment 10 Graham Perrin 2021-06-06 04:56:04 UTC
(In reply to Derek Schrock from comment #0)

FreeBSD 12.0-RELEASE-p3 amd64 host … kernel panic …

(In reply to Derek Schrock from comment #3)

> … 5.2.44_3 -> 5.2.44_4 …

Please: 

* any host panic with 6.1.18 or greater? 


(In reply to Derek Schrock from comment #1)

> … Should this be reported to upsteam?

Good question. 

---

For readers who may be unfamiliar with recent histories: 

a) <https://www.virtualbox.org/wiki/Changelog-5.2> 
   5.2.⋯ is no longer supported by Oracle

b) <https://www.freshports.org/emulators/virtualbox-ose-legacy/#history>
   <https://cgit.freebsd.org/ports/commit/?id=42d8425b62bb66d04dd01c96d516c11451fd86dd>
   
   emulators/virtualbox-ose        became 6.1.18 for AMD64 only
   emulators/virtualbox-ose        now is 6.1.22 for AMD64 only
   
   emulators/virtualbox-ose-legacy now is 5.2.44_5
Comment 11 Derek Schrock 2021-06-06 20:43:49 UTC
No panics with 13.0 and 6.x yet.