Bug 186051 - [vmware] [panic] FreeBSD 8.4+, 9.x+, 10.0 guest panic with VMWare Server on boot
Summary: [vmware] [panic] FreeBSD 8.4+, 9.x+, 10.0 guest panic with VMWare Server on boot
Status: Closed Not A Bug
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-emulation (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-23 21:20 UTC by Steven Spence
Modified: 2014-10-02 20:40 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Steven Spence 2014-01-23 21:20:00 UTC
I posted to the forums about this about 6 months ago but nobody ever
responded:

https://forums.freebsd.org/viewtopic.php?&t=40223

With the arrival of 10.0 I decided to try it again with the same results.
I am using VMWare Server 1.0.10 that is installed on a AMD64 CentOS 5.9
box.  I have a few 8.3 installs of FreeBSD running fine on it but any
attempts to move to newer versions have resulted in a kernel panic on
boot.  If I move from 8.3 to 8.4 (via a rebuild of kernel/world) it
occurs so it appears something changed between those two versions that
is causing it.  So far every newer version than 8.3 has resulted in the
same panic.  To remove any possibility of me maybe having messed up
building the kernel or the world I downloaded a 9.2 and 10.0 install CD
but the results are the same, a kernel panic on boot.  The kernel panic
is always similar (usually different addresses).

I know VMWare Server is someone antiquated but the virtual hardware has
not changed and it does work on FreeBSD <8.3 as I have upgraded from older
versions in the past with it.

How-To-Repeat: Install VMWare Server 1.0.10 on an AMD64 CentOS 5.9 (not sure if the
arch/OS is relevant) machine and try to install a newer FreeBSD (>8.3).
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2014-04-20 02:48:45 UTC
State Changed
From-To: open->open

Over to maintainer(s). 


Comment 2 Mark Linimon freebsd_committer freebsd_triage 2014-04-20 02:48:45 UTC
Responsible Changed
From-To: freebsd-amd64->freebsd-emulation
Comment 3 John Baldwin freebsd_committer freebsd_triage 2014-04-21 19:37:41 UTC
It appears to be crashing because VMWare is raising a privileged
instruction trap in the OS when it invokes 'hlt'.  That seems like a
bug in VMWare.  There isn't a way to disable 'hlt' from the loader
prompt unfortunately.

Can you show the output of 'sysctl machdep.idle' under your working
kernel?

-- 
John Baldwin
Comment 4 Steven Spence 2014-04-21 20:45:10 UTC
Output of "sysctl machdep.idle"

machdep.idle: amdc1e

This is from a 8.3-RELEASE-p15 box.

Thanks,
Steven
Comment 5 John Baldwin freebsd_committer freebsd_triage 2014-04-28 15:32:23 UTC
On Monday, April 21, 2014 01:45:10 PM Steven Spence wrote:
> Output of "sysctl machdep.idle"
> 
> machdep.idle: amdc1e
> 
> This is from a 8.3-RELEASE-p15 box.

Hummm.  We really shouldn't be doing anything differently.  However, we do a
bit more (including a wrmsr) during idle halt on your machine.  Can you build
a stable/8 kernel with debug symbols in an 8.3 guest and capture the panic
messages from booting that kernel?

-- 
John Baldwin
Comment 6 Steven Spence 2014-04-29 04:04:40 UTC
On 04/28/2014 08:32 AM, John Baldwin wrote:
> Can you build a stable/8 kernel with debug symbols in an 8.3 guest
> and capture the panic messages from booting that kernel?

Here is a capture of the panic from a stable/8 kernel.  Is the only 
debugging option you are looking for in the kernel config 
"makeoptions     DEBUG=-g"?  I still have the 8.3 kernel on there I can 
boot if I need to get in and recompile the stable/8 kernel differently.  
I am not sure how much use the information below will be to you.

kernel trap 1 with interrupts disabled
Fatal trap 1: privileged instruction fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer     = 0x20:0xffffffff809c342e
stack pointer           = 0x28:0xffffff8000211b40
acd0: CDROM <VMware Virtual IDE CDROM Drive/00000001> at ata1-master UDMA33
frame pointer           = 0x28:0xffffff8000211b60
code segment            = base rx0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 11 (idle: cpu0)
trap number             = 1
panic: privileged instruction fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff8067c0b6 at kdb_backtrace+0x66
#1 0xffffffff8064861e at panic+0x1ce
#2 0xffffffff809d3750 at trap_fatal+0x290
#3 0xffffffff809d3ce5 at trap+0x105
#4 0xffffffff809ba944 at calltrap+0x8
#5 0xffffffff8066e08f at sched_idletd+0x11f
#6 0xffffffff8061ceaf at fork_exit+0x11f
#7 0xffffffff809bae8e at fork_trampoline+0xe
Uptime: 1s
Cannot dump. Device not defined or unavailable.
Automatic reboot in 15 seconds - press a key on the console to abort

I have also tried to dump the panic to a swap device but I don't think 
it is getting far enough in the kernel boot to initialize any hard drive 
storage devices.

If there is anything else I can try to get more information out of this 
let me know.

Thanks,
Steven
Comment 7 John Baldwin freebsd_committer freebsd_triage 2014-04-29 20:43:16 UTC
On Monday, April 28, 2014 11:04:40 pm Steven Spence wrote:
> I have also tried to dump the panic to a swap device but I don't think 
> it is getting far enough in the kernel boot to initialize any hard drive 
> storage devices.
> 
> If there is anything else I can try to get more information out of this 
> let me know.

If you have the result of this kernel build, can you find the kernel.debug 
file it generated and run 'gdb kernel.debug' and then 'l *0xffffffff809c342e'?
That will (hopefully) identify the exact line it panic'd on.  It might also
be useful to do 'x/i 0xffffffff809c342e' in gdb as well.

-- 
John Baldwin
Comment 8 Steven Spence 2014-04-30 03:13:20 UTC
On 04/29/2014 01:43 PM, John Baldwin wrote:
> If you have the result of this kernel build, can you find the kernel.debug
> file it generated and run 'gdb kernel.debug' and then 'l *0xffffffff809c342e'?
> That will (hopefully) identify the exact line it panic'd on.  It might also
> be useful to do 'x/i 0xffffffff809c342e' in gdb as well.

Below are the results of the two gdb commands:

(gdb) l *0xffffffff809c342e
0xffffffff809c342e is in cpu_idle_mwait (cpufunc.h:470).
465     }
466
467     static __inline void
468     cpu_monitor(const void *addr, int extensions, int hints)
469     {
470             __asm __volatile("monitor;"
471                 : :"a" (addr), "c" (extensions), "d"(hints));
472     }
473
474     static __inline void

(gdb) x/i 0xffffffff809c342e
0xffffffff809c342e <cpu_idle_mwait+62>: monitor %eax,%ecx,%edx

Thanks,
Steven
Comment 9 John Baldwin freebsd_committer freebsd_triage 2014-04-30 17:09:44 UTC
On Tuesday, April 29, 2014 10:13:20 pm Steven Spence wrote:
> (gdb) x/i 0xffffffff809c342e
> 0xffffffff809c342e <cpu_idle_mwait+62>: monitor %eax,%ecx,%edx

That's interesting.  It's dying on monitor, not hlt.

Can you capture the CPU lines from dmesg from a working kernel?  I want
to see if VMWare is advertising the ability to use monitor via cpuid.

Also, try setting 'machdep.idle_mwait=0' at the loader prompt before
booting to see if that fixes the panic.

-- 
John Baldwin
Comment 10 Steven Spence 2014-04-30 17:47:31 UTC
On 04/30/2014 10:09 AM, John Baldwin wrote:
> Also, try setting 'machdep.idle_mwait=0' at the loader prompt before booting to
> see if that fixes the panic.
>
Here is the requested information:

CPU: Quad-Core AMD Opteron(tm) Processor 2384 (2726.06-MHz K8-class CPU)
   Origin = "AuthenticAMD"  Id = 0x100f42  Family = 10  Model = 4 
Stepping = 2
Features=0x783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2>
   Features2=0x802009<SSE3,MON,CX16,POPCNT>
   AMD 
Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
   AMD 
Features2=0x37e9<LAHF,ExtAPIC,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT>
   TSC: P-state invariant

Setting 'machdep.idle_mwait=0' did fix the panic.  It successfully 
booted into 8.4-STABLE with this option set.  I am not sure what (if 
any) ramifications this option causes but if there are little to none I 
am fine with sticking this in my /boot/loader.conf and running with it.  
If you feel there is a deeper/generic problem that still needs to be 
worked out I can try to provide whatever information you need.

Thanks,
Steven
Comment 11 John Baldwin freebsd_committer freebsd_triage 2014-04-30 18:17:18 UTC
On Wednesday, April 30, 2014 12:47:31 pm Steven Spence wrote:
> Here is the requested information:
> 
> CPU: Quad-Core AMD Opteron(tm) Processor 2384 (2726.06-MHz K8-class CPU)
>    Origin = "AuthenticAMD"  Id = 0x100f42  Family = 10  Model = 4 
> Stepping = 2
> Features=0x783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2>
>    Features2=0x802009<SSE3,MON,CX16,POPCNT>

Looks like it is telling the guest here it is ok to use montior ("MON"
feature).

>    AMD 
> Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
>    AMD 
> Features2=0x37e9<LAHF,ExtAPIC,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT>
>    TSC: P-state invariant
> 
> Setting 'machdep.idle_mwait=0' did fix the panic.  It successfully 
> booted into 8.4-STABLE with this option set.  I am not sure what (if 
> any) ramifications this option causes but if there are little to none I 
> am fine with sticking this in my /boot/loader.conf and running with it.  
> If you feel there is a deeper/generic problem that still needs to be 
> worked out I can try to provide whatever information you need.

It should be fine as a workaround.  The remaining issues I can see are:

1) Should we disable monitor automatically for VMWare?

2) This should be reported to the VMWare folks as it is ultimately their
bug.  If they don't support usage of 'monitor' by guest OS's, then they
should hide it from the cpuid information.

Would you be able to handle 2)?  I would like to see what they say before
adventuring too much further down the path of 1).

-- 
John Baldwin
Comment 12 Steven Spence 2014-04-30 18:58:35 UTC
On 04/30/2014 11:17 AM, John Baldwin wrote:
> It should be fine as a workaround.  The remaining issues I can see are:
>
> 1) Should we disable monitor automatically for VMWare?

I am not sure on this one.  Did FreeBSD start using or change how it was 
using this feature with kernels > 8.3?  Everything worked good up to 
that kernel version, even with VMWare falsely advertising that it 
supports the monitor flag.  I went looking at the flags the host (CentOS 
5) reports for the physical CPU and I don't see the 'monitor' flag in 
there either so I am not sure where VMWare is getting the idea it is 
supported.

> 2) This should be reported to the VMWare folks as it is ultimately their
> bug.  If they don't support usage of 'monitor' by guest OS's, then they
> should hide it from the cpuid information.
>
> Would you be able to handle 2)?  I would like to see what they say before
> adventuring too much further down the path of 1).

I don't mind contacting VMWare about it but I am almost positive they 
are going to tell me that is not a product they support any more and 
that I should upgrade to ESX, vSphere, or whatever their latest 
incarnation is.  Newer FreeBSDs appear to work with newer VMWare 
products as I didn't run across anyone else having this problem when I 
first went searching for a solution.  I don't think disabling a feature 
that appears to work for others just because of some old corner case is 
a good idea.  Doubly so since there is an option to bypass the problem 
for people with older VMWare installs like mine.  Let me know if you 
still think contacting VMWare is worth pursuing.

This is just probably the kick in the butt I need to convert the VMs to 
Virtualbox or something more recent and supported.

Thanks,
Steven
Comment 13 John Baldwin freebsd_committer freebsd_triage 2014-04-30 22:34:28 UTC
On Wednesday, April 30, 2014 1:58:35 pm Steven Spence wrote:
> I am not sure on this one.  Did FreeBSD start using or change how it was 
> using this feature with kernels > 8.3?  Everything worked good up to 
> that kernel version, even with VMWare falsely advertising that it 
> supports the monitor flag.  I went looking at the flags the host (CentOS 
> 5) reports for the physical CPU and I don't see the 'monitor' flag in 
> there either so I am not sure where VMWare is getting the idea it is 
> supported.

I think most CPUs support monitor nowadays.  It was added in the Pentium III
IIRC.  I think FreeBSD did not use it by default in 8.3 and earlier.

> I don't mind contacting VMWare about it but I am almost positive they 
> are going to tell me that is not a product they support any more and 
> that I should upgrade to ESX, vSphere, or whatever their latest 
> incarnation is.  Newer FreeBSDs appear to work with newer VMWare 
> products as I didn't run across anyone else having this problem when I 
> first went searching for a solution.  I don't think disabling a feature 
> that appears to work for others just because of some old corner case is 
> a good idea.  Doubly so since there is an option to bypass the problem 
> for people with older VMWare installs like mine.  Let me know if you 
> still think contacting VMWare is worth pursuing.

Ahhh, ok.  So it sounds like it's probably a bug that they might have
already fixed.  I think in that case I agree that it's probably best to
document this in the PR so Google searches can find the workaround. :)

-- 
John Baldwin
Comment 14 John Baldwin freebsd_committer freebsd_triage 2014-10-02 20:40:09 UTC
The issue appears to be a bug in older versions of VMWare that is fixed in newer versions.