Bug 236917 - emulators/virtualbox-ose: Sleeping thread owns a non-sleepable lock kernel panic
Summary: emulators/virtualbox-ose: Sleeping thread owns a non-sleepable lock kernel panic
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: --- Affects Only Me
Assignee: vbox (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-03-31 19:42 UTC by Derek Schrock
Modified: 2020-10-25 00:18 UTC (History)
1 user (show)

See Also:
bugzilla: maintainer-feedback? (vbox)


Attachments
Three set of panic backtraces (5.86 KB, text/plain)
2019-04-13 23:35 UTC, Derek Schrock
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Derek Schrock 2019-03-31 19:42:59 UTC
FreeBSD 12.0-RELEASE-p3 amd64 host running multiple guests (FreeBSD, RHEL, Fedora, Windows) I'll get the follow kernel panic (see below).  It seems to be popular during startups of the hosts however I've seen it on long running states as well.  I've had this kernel panic this host 4 time in the past month.  I don't have any reproducers.

The vbox home directory and data is hosted on a ZFS mirror.

 kernel: Sleeping thread (tid 100909, pid 1118) owns a non-sleepable lock
 kernel: KDB: stack backtrace of thread 100909:
 kernel: #0 0xffffffff80ba6de1 at mi_switch+0xe1
 kernel: #1 0xffffffff80bf5c45 at sleepq_catch_signals+0x405
 kernel: #2 0xffffffff80bf5f24 at sleepq_timedwait_sig+0x14
 kernel: #3 0xffffffff82669ba6 at rtR0SemEventMultiBsdWait+0x2a6
 kernel: #4 0xffffffff82a9784a at __stop_set_sysinit_set+0x9e02
 kernel: #5 0xffffffff8263fa61 at supdrvIOCtlInnerUnrestricted+0x1781
 kernel: #6 0xffffffff8264d3fd at VBoxDrvFreeBSDIOCtl+0xed
 kernel: #7 0xffffffff80a51acd at devfs_ioctl+0xad
 kernel: #8 0xffffffff811f945e at VOP_IOCTL_APV+0x7e
 kernel: #9 0xffffffff80c7a144 at vn_ioctl+0x1a4
 kernel: #10 0xffffffff80a520af at devfs_ioctl_f+0x1f
 kernel: #11 0xffffffff80c04f3d at kern_ioctl+0x26d
 kernel: #12 0xffffffff80c04c5e at sys_ioctl+0x15e
 kernel: #13 0xffffffff81075449 at amd64_syscall+0x369
 kernel: #14 0xffffffff8104fd1d at fast_syscall_common+0x101
 kernel: panic: sleeping thread
 kernel: cpuid = 2
 kernel: time = 1554059485
 kernel: KDB: stack backtrace:
 kernel: #0 0xffffffff80be7977 at kdb_backtrace+0x67
 kernel: #1 0xffffffff80b9b563 at vpanic+0x1a3
 kernel: #2 0xffffffff80b9b3b3 at panic+0x43
 kernel: #3 0xffffffff80bfd412 at propagate_priority+0x2b2
 kernel: #4 0xffffffff80bfe048 at turnstile_wait+0x338
 kernel: #5 0xffffffff80b7ade6 at __mtx_lock_sleep+0x196
 kernel: #6 0xffffffff80f0071e at vm_reserv_alloc_page+0x4ee
 kernel: #7 0xffffffff80ef2c7e at vm_page_alloc_domain_after+0xde
 kernel: #8 0xffffffff80ef2aa4 at vm_page_alloc+0x74
 kernel: #9 0xffffffff80edbd3c at vm_fault_hold+0x13ac
 kernel: #10 0xffffffff80eda940 at vm_fault+0x60
 kernel: #11 0xffffffff81074ae3 at trap_pfault+0x163
 kernel: #12 0xffffffff81074175 at trap+0x425
 kernel: #13 0xffffffff8104f435 at calltrap+0x8
Comment 1 Derek Schrock 2019-04-13 23:35:20 UTC
Created attachment 203654 [details]
Three set of panic backtraces

After upgrading to virtualbox-ose-5.2.26_4 today three panics in a row with mildly different backtraces.  Disabling vboxheadless and starting vboxheadless by hand did not panic the machine.

All panics were "Sleeping thread (tid X, pid Y) owns a non-sleepable lock"
All referencing VBoxDrvFreeBSDIOCtl and rtR0SemEventWait

Should this be reported to upsteam?
Comment 2 martin ilavsky 2020-10-22 19:39:50 UTC
Today I was doing update within FreeBSD amd64 12.1 using svn, from r354886 to r366929. Custom kernel is basically DEFAULT + PF and watchdog. 
I did update the virtualbox packages too, to these versions:

virtualbox-ose-5.2.44              <   needs updating (index has 5.2.44_4)
virtualbox-ose-kmod-5.2.44         <   needs updating (index has 5.2.44_3)

I hit the issue when VMs got started.

Unread portion of the kernel message buffer:
Sleeping thread (tid 101652, pid 1456) owns a non-sleepable lock
KDB: stack backtrace of thread 101652:
#0 0xffffffff80be2062 at mi_switch+0xe2
#1 0xffffffff80c31c35 at sleepq_catch_signals+0x425
#2 0xffffffff80c31f24 at sleepq_timedwait_sig+0x14
#3 0xffffffff82c4c726 at rtR0SemEventMultiBsdWait+0x2a6
#4 0xffffffff82c9091b at __stop_set_sysinit_set+0x9ed3
#5 0xffffffff82c22a9c at supdrvIOCtlInnerUnrestricted+0x17bc
#6 0xffffffff82c3046d at VBoxDrvFreeBSDIOCtl+0xed
#7 0xffffffff80a8bf8d at devfs_ioctl+0xad
#8 0xffffffff8127b6fc at VOP_IOCTL_APV+0x7c
#9 0xffffffff80cb6b5a at vn_ioctl+0x16a
#10 0xffffffff80a8c57f at devfs_ioctl_f+0x1f
#11 0xffffffff80c4167e at kern_ioctl+0x2be
#12 0xffffffff80c4134d at sys_ioctl+0x15d
#13 0xffffffff810f0984 at amd64_syscall+0x364
#14 0xffffffff810c9c50 at fast_syscall_common+0x101
panic: sleeping thread
cpuid = 3
time = 1603373869
KDB: stack backtrace:
#0 0xffffffff80c233b7 at kdb_backtrace+0x67
#1 0xffffffff80bd66ed at vpanic+0x19d
#2 0xffffffff80bd6543 at panic+0x43
#3 0xffffffff80c396e1 at propagate_priority+0x291
#4 0xffffffff80c3a34b at turnstile_wait+0x32b
#5 0xffffffff80bb6121 at __mtx_lock_sleep+0x191
#6 0xffffffff80f7ad80 at vm_reserv_reclaim_inactive+0x90
#7 0xffffffff80f6c210 at vm_page_alloc_domain_after+0x1d0
#8 0xffffffff810f1621 at uma_small_alloc+0x61
#9 0xffffffff80f4ef0d at keg_alloc_slab+0xdd
#10 0xffffffff80f51c9d at keg_fetch_slab+0x17d
#11 0xffffffff80f51461 at zone_fetch_slab+0x61
#12 0xffffffff80f51531 at zone_import+0x51
#13 0xffffffff80f4d5a8 at uma_zalloc_arg+0x408
#14 0xffffffff827929a2 at abd_alloc+0x112
#15 0xffffffff82798319 at arc_hdr_alloc_pabd+0x99
#16 0xffffffff82795554 at arc_hdr_alloc+0x124
#17 0xffffffff827951f9 at arc_alloc_buf+0x29
Uptime: 2m35s

Thread in question:
101652 (PID=1456: VBoxHeadless)              0xffffffff80c07f9a in sched_switch ()

I was able to reproduce this every time VMs got booted.

I had to boot back to previous environment, issue is not there. I wonder -- is this a problem of the locking mechanism introduced newer FreeBSD version or in the logic of the VirtualBox kernel module?
Comment 3 Derek Schrock 2020-10-25 00:18:37 UTC
I didn't think of it at the time however during 5.2.44_3 -> 5.2.44_4 (Oct 14th) post reboot and first VM start (Windows 10 VM).  However, post kernel panic everything started fine.  I've had multiple updates between this bug report and pre-5.2.44_4 that didn't result in a panic during init. VM startup in previous upgrades.  Hard to tell where the problem is at.


Oct 14 09:26:32 <kern.crit> ircbsd kernel: #0 0xffffffff80bdbeb2 at mi_switch+0xe2
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #1 0xffffffff80c2ba85 at sleepq_catch_signals+0x425
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #2 0xffffffff80c2bd74 at sleepq_timedwait_sig+0x14
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #3 0xffffffff8267d726 at rtR0SemEventMultiBsdWait+0x2a6
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #4 0xffffffff82c5f91b at __stop_set_sysinit_set+0x9ed3
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #5 0xffffffff82653a9c at supdrvIOCtlInnerUnrestricted+0x17bc
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #6 0xffffffff8266146d at VBoxDrvFreeBSDIOCtl+0xed
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #7 0xffffffff80a85dbd at devfs_ioctl+0xad
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #8 0xffffffff8122b54c at VOP_IOCTL_APV+0x7c
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #9 0xffffffff80cb09aa at vn_ioctl+0x16a
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #10 0xffffffff80a863af at devfs_ioctl_f+0x1f
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #11 0xffffffff80c3b4ce at kern_ioctl+0x2be
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #12 0xffffffff80c3b19d at sys_ioctl+0x15d
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #13 0xffffffff810a9984 at amd64_syscall+0x364
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #14 0xffffffff81082b40 at fast_syscall_common+0x101
Oct 14 09:26:32 <kern.crit> ircbsd kernel: panic: sleeping thread
Oct 14 09:26:32 <kern.crit> ircbsd kernel: cpuid = 6
Oct 14 09:26:32 <kern.crit> ircbsd kernel: time = 1602681916
Oct 14 09:26:32 <kern.crit> ircbsd kernel: KDB: stack backtrace:
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #0 0xffffffff80c1d207 at kdb_backtrace+0x67
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #1 0xffffffff80bd053d at vpanic+0x19d
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #2 0xffffffff80bd0393 at panic+0x43
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #3 0xffffffff80c33531 at propagate_priority+0x291
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #4 0xffffffff80c3419b at turnstile_wait+0x32b
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #5 0xffffffff80baff71 at __mtx_lock_sleep+0x191
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #6 0xffffffff80f33c70 at vm_reserv_reclaim_inactive+0x90
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #7 0xffffffff80f25100 at vm_page_alloc_domain_after+0x1d0
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #8 0xffffffff80f24e14 at vm_page_alloc+0x74
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #9 0xffffffff80f0c1e1 at vm_fault_hold+0xb41
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #10 0xffffffff80f0b650 at vm_fault+0x60
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #11 0xffffffff810a8f34 at trap_pfault+0x164
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #12 0xffffffff810a859b at trap+0x42b
Oct 14 09:26:32 <kern.crit> ircbsd kernel: #13 0xffffffff8108221c at calltrap+0x8