Bug 211651 - emulators/virtualbox-ose-kmod 5.0.26_1 with Linux guest crashes 12.0-CURRENT host when # of processors > 1
Summary: emulators/virtualbox-ose-kmod 5.0.26_1 with Linux guest crashes 12.0-CURRENT ...
Status: Closed Overcome By Events
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: vbox (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-08-08 00:13 UTC by Don Lewis
Modified: 2018-01-23 17:31 UTC (History)
3 users (show)

See Also:
bugzilla: maintainer-feedback? (vbox)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Don Lewis freebsd_committer 2016-08-08 00:13:05 UTC
If I attempt to start a Linux guest on a FreeBSD 12.0-CURRENT host I get a kernel panic similar to:

panic: Unregistered use of FPU in kernel
cpuid = 3
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe085a31c030
vpanic() at vpanic+0x182/frame 0xfffffe085a31c0b0
kassert_panic() at kassert_panic+0x126/frame 0xfffffe085a31c120
trap() at trap+0x7ae/frame 0xfffffe085a31c330
calltrap() at calltrap+0x8/frame 0xfffffe085a31c330
--- trap 0x16, rip = 0xffffffff827273a9, rsp = 0xfffffe085a31c408, rbp = 0xfffff
e085a31c430 ---
null_bug_bypass() at 0xffffffff827273a9/frame 0xfffffe085a31c430
null_bug_bypass() at 0xffffffff826985c7/frame 0x3
KDB: enter: panic

if the VM is configured with more than one processor.  I've seen this with both CentOS 7 and Ubuntu 12 guests.  The panic appears to occur near the start of the guest kernel boot after grub has run.  It appears to happen shortly after the kernel message about TSC calibration is printed.  The symbols printed by DDB leading up to the trap appear to be somewhat arbitrary.  The location of the trap seems to be aboe the topmost BSS section symbol in one of the (last?) loaded .kmod.

The code at the location that triggers the trap is:
   0xffffffff8272739d:	nop
   0xffffffff8272739e:	nop
   0xffffffff8272739f:	nop
   0xffffffff827273a0:	mov    %rsi,%rdx
   0xffffffff827273a3:	shr    $0x20,%rdx
   0xffffffff827273a7:	mov    %esi,%eax
=> 0xffffffff827273a9:	xrstor (%rdi)
   0xffffffff827273ac:	retq   
   0xffffffff827273ad:	int3   
   0xffffffff827273ae:	int3   
   0xffffffff827273af:	int3   
   0xffffffff827273b0:	int3

It is called from here:
   0xffffffff82667489:	test   %eax,%eax
   0xffffffff8266748b:	jne    0xffffffff826674a1
   0xffffffff8266748d:	movq   $0x3,0x5238(%r15)
   0xffffffff82667498:	mov    %rbx,%rsi
   0xffffffff8266749b:	and    $0xfffffffffffffffc,%rsi
   0xffffffff8266749f:	je     0xffffffff826674ad
   0xffffffff826674a1:	mov    0x5240(%r15),%rdi
   0xffffffff826674a8:	callq  0xffffffff827273a0
=> 0xffffffff826674ad:	or     %rbx,0x5238(%r15)
   0xffffffff826674b4:	mov    %r14d,%eax
   0xffffffff826674b7:	add    $0x8,%rsp

kgdb (from ports) doesn't believe that either of these to any function.
 
The VMs where I first saw the problem were initially created with Virtualbox 4 and the paravirtualization setting is "Legacy", but I can reproduce this panic after creating a new VM which uses the "Default" setting, increasing the number of processors to 4, and booting the CentOS 7 install .iso.

The CPU info is:

CPU: AMD FX-8320E Eight-Core Processor               (3210.84-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x600f20  Family=0x15  Model=0x2  Stepping=0
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,C
MOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x3e98320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,POPCNT,AE
SNI,XSAVE,OSXSAVE,AVX,F16C>
  AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
  AMD Features2=0x1ebbfff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,I
BS,XOP,SKINIT,WDT,LWP,FMA4,TCE,NodeId,TBM,Topology,PCXC,PNXC>
  Structured Extended Features=0x8<BMI1>
  SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=65536
  TSC: P-state invariant, performance statistics

Whether or not this problem occurs with Intel CPUs is unknown.

This problem did not occur before the upgrade from Virtualbox 4 to Virtualbox 5.
Comment 1 Don Lewis freebsd_committer 2016-08-08 22:35:28 UTC
I was unable to reproduce this with the CentOS 7 .iso on:

FreeBSD 10.3-STABLE #11 r303852: Mon Aug  8 13:59:38 PDT 2016
    dl@hoover:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
CPU: AMD FX(tm)-4100 Quad-Core Processor             (3624.26-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x600f12  Family=0x15  Model=0x1  Stepping=2
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,C
MOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x1e98220b<SSE3,PCLMULQDQ,MON,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AESNI,
XSAVE,OSXSAVE,AVX>
  AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
  AMD Features2=0x1c9bfff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,I
BS,XOP,SKINIT,WDT,LWP,FMA4,NodeId,Topology,PCXC,PNXC>
  SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=65536
  TSC: P-state invariant, performance statistics


I'll try to set up a test on FreeBSD 11.0-BETA 4, but it will take a while.
Comment 2 Don Lewis freebsd_committer 2016-08-08 23:41:07 UTC
Correction.  I can't reproduce this problem with 10.3-STABLE GENERIC kernel, but I can if I enable INVARIANTS.
Comment 3 Don Lewis freebsd_committer 2016-08-08 23:55:20 UTC
Setting paravirtualization to "Minimal" does not solve the problem.
Comment 4 Jung-uk Kim freebsd_committer 2016-08-09 19:00:12 UTC
FYI, this function is ASMXRstor():

https://www.virtualbox.org/browser/vbox/trunk/src/VBox/Runtime/common/asm/ASMXRstor.asm
Comment 5 Jung-uk Kim freebsd_committer 2016-08-09 19:03:45 UTC
It's been used since r55312:

https://www.virtualbox.org/changeset/55312
Comment 6 Jung-uk Kim freebsd_committer 2016-08-09 19:09:42 UTC
Enabled for AMD-V in r55316:

https://www.virtualbox.org/changeset/55316
Comment 7 Jung-uk Kim freebsd_committer 2016-08-09 19:23:40 UTC
Can you please confirm whether disabling AMD-V works around the issue?
Comment 8 Don Lewis freebsd_committer 2016-08-09 22:12:39 UTC
If I try to disable the AMD-V setting, Virtualbox complains "Invalid Settings Detected".  If I disable virtualization in the BIOS, Virtualbox only seems to understand 32-bit guests.

I hadn't gone spelunking in the .kmod source because the stack traces fooled me into thinking that the code was not part of the .kmod.

ASMXRstore() is called from CPUMSetGuestXcr0(PVMCPU pVCpu, uint64_t uNewValue)
here: <https://www.virtualbox.org/browser/vbox/trunk/src/VBox/VMM/VMMAll/CPUMAllRegs.cpp>.  Perhaps it just needs added calls to fpu_kern_enter() and fpu_kern_leave().

Interestingly I don't see any calls to ASMXSave().
Comment 9 Don Lewis freebsd_committer 2016-08-09 22:41:48 UTC
That code has been present in Virtualbox for a while, but it is not present in version 4.3.38, which was the latest version of our port until the recent upgrade to 5.0.26.  The system panics started after that upgrade.
Comment 10 commit-hook freebsd_committer 2016-08-13 04:06:29 UTC
A commit references this bug:

Author: jkim
Date: Sat Aug 13 04:05:35 UTC 2016
New revision: 420152
URL: https://svnweb.freebsd.org/changeset/ports/420152

Log:
  Temporarily disable AVX support for guest.  It is unstable for FreeBSD.

  PR:		211651

Changes:
  head/emulators/virtualbox-ose/Makefile
  head/emulators/virtualbox-ose/files/patch-src_VBox_VMM_VMMR3_CPUMR3CpuId.cpp
Comment 11 Don Lewis freebsd_committer 2016-08-14 20:14:17 UTC
My existing CentOS 7 VM won't boot with this change.  It gets most of the way through boot, but then (when X is starting?) I get a white screen with a frowny face that says something went wrong and I should contact my administrator ;-(

I rebuilt virtualbox without the patch and and CentOS 7 boot, though I have to restrict it to one processor, otherwise the host will panic.

I'll try creating a new CentOS 7 guest with the patched Virtualbox.

My existing CentOS 5 VM booted normally with the patch.
Comment 12 Don Lewis freebsd_committer 2016-08-14 22:13:01 UTC
It looks like something related to Xorg is the culprit.  I was able to reproduce this with a new CentOS 7 VM if I install GNOME.  Possibly some component is assuming that it can use AVX and dies when it can't.

I've seen that screen before on FreeBSD if gdm isn't able to start a session.
Comment 13 Peter Jeremy freebsd_committer 2017-12-11 01:56:18 UTC
The workaround in r420152 has the undesirable side-effect of also disabling AVX and AVX2 extensions, which clang will use by default even on integer code.

Since that workaround was committed, VBox has been updated from 5.0.26 to 5.2.2.  Has anyone done any investigation to determine whether this bug is still present?
Comment 14 Walter Schwarzenfeld freebsd_triage 2018-01-16 15:15:48 UTC
We have version 5.2.4. Does the problem with recent version still exists?
Comment 15 Don Lewis freebsd_committer 2018-01-23 06:41:56 UTC
I just built VirtualBox 5.2.6 after removing
files/patch-src_VBox_VMM_VMMR3_CPUMR3CpuId.cpp.

I was able to start CentOS 7 and Ubuntu 12 guests.

It looks like the problem is fixed upstream and we no longer require the patch.
Comment 16 commit-hook freebsd_committer 2018-01-23 17:31:44 UTC
A commit references this bug:

Author: jkim
Date: Tue Jan 23 17:30:50 UTC 2018
New revision: 459789
URL: https://svnweb.freebsd.org/changeset/ports/459789

Log:
  Re-enable AVX/AVX2 support for guest.

  This patch is no longer necessary according to the original reporter.

  PR:		211651

Changes:
  head/emulators/virtualbox-ose/Makefile
  head/emulators/virtualbox-ose/files/patch-src_VBox_VMM_VMMR3_CPUMR3CpuId.cpp