I posted to the forums about this about 6 months ago but nobody ever responded: https://forums.freebsd.org/viewtopic.php?&t=40223 With the arrival of 10.0 I decided to try it again with the same results. I am using VMWare Server 1.0.10 that is installed on a AMD64 CentOS 5.9 box. I have a few 8.3 installs of FreeBSD running fine on it but any attempts to move to newer versions have resulted in a kernel panic on boot. If I move from 8.3 to 8.4 (via a rebuild of kernel/world) it occurs so it appears something changed between those two versions that is causing it. So far every newer version than 8.3 has resulted in the same panic. To remove any possibility of me maybe having messed up building the kernel or the world I downloaded a 9.2 and 10.0 install CD but the results are the same, a kernel panic on boot. The kernel panic is always similar (usually different addresses). I know VMWare Server is someone antiquated but the virtual hardware has not changed and it does work on FreeBSD <8.3 as I have upgraded from older versions in the past with it. How-To-Repeat: Install VMWare Server 1.0.10 on an AMD64 CentOS 5.9 (not sure if the arch/OS is relevant) machine and try to install a newer FreeBSD (>8.3).
State Changed From-To: open->open Over to maintainer(s).
Responsible Changed From-To: freebsd-amd64->freebsd-emulation
It appears to be crashing because VMWare is raising a privileged instruction trap in the OS when it invokes 'hlt'. That seems like a bug in VMWare. There isn't a way to disable 'hlt' from the loader prompt unfortunately. Can you show the output of 'sysctl machdep.idle' under your working kernel? -- John Baldwin
Output of "sysctl machdep.idle" machdep.idle: amdc1e This is from a 8.3-RELEASE-p15 box. Thanks, Steven
On Monday, April 21, 2014 01:45:10 PM Steven Spence wrote: > Output of "sysctl machdep.idle" > > machdep.idle: amdc1e > > This is from a 8.3-RELEASE-p15 box. Hummm. We really shouldn't be doing anything differently. However, we do a bit more (including a wrmsr) during idle halt on your machine. Can you build a stable/8 kernel with debug symbols in an 8.3 guest and capture the panic messages from booting that kernel? -- John Baldwin
On 04/28/2014 08:32 AM, John Baldwin wrote: > Can you build a stable/8 kernel with debug symbols in an 8.3 guest > and capture the panic messages from booting that kernel? Here is a capture of the panic from a stable/8 kernel. Is the only debugging option you are looking for in the kernel config "makeoptions DEBUG=-g"? I still have the 8.3 kernel on there I can boot if I need to get in and recompile the stable/8 kernel differently. I am not sure how much use the information below will be to you. kernel trap 1 with interrupts disabled Fatal trap 1: privileged instruction fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0xffffffff809c342e stack pointer = 0x28:0xffffff8000211b40 acd0: CDROM <VMware Virtual IDE CDROM Drive/00000001> at ata1-master UDMA33 frame pointer = 0x28:0xffffff8000211b60 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 11 (idle: cpu0) trap number = 1 panic: privileged instruction fault cpuid = 0 KDB: stack backtrace: #0 0xffffffff8067c0b6 at kdb_backtrace+0x66 #1 0xffffffff8064861e at panic+0x1ce #2 0xffffffff809d3750 at trap_fatal+0x290 #3 0xffffffff809d3ce5 at trap+0x105 #4 0xffffffff809ba944 at calltrap+0x8 #5 0xffffffff8066e08f at sched_idletd+0x11f #6 0xffffffff8061ceaf at fork_exit+0x11f #7 0xffffffff809bae8e at fork_trampoline+0xe Uptime: 1s Cannot dump. Device not defined or unavailable. Automatic reboot in 15 seconds - press a key on the console to abort I have also tried to dump the panic to a swap device but I don't think it is getting far enough in the kernel boot to initialize any hard drive storage devices. If there is anything else I can try to get more information out of this let me know. Thanks, Steven
On Monday, April 28, 2014 11:04:40 pm Steven Spence wrote: > I have also tried to dump the panic to a swap device but I don't think > it is getting far enough in the kernel boot to initialize any hard drive > storage devices. > > If there is anything else I can try to get more information out of this > let me know. If you have the result of this kernel build, can you find the kernel.debug file it generated and run 'gdb kernel.debug' and then 'l *0xffffffff809c342e'? That will (hopefully) identify the exact line it panic'd on. It might also be useful to do 'x/i 0xffffffff809c342e' in gdb as well. -- John Baldwin
On 04/29/2014 01:43 PM, John Baldwin wrote: > If you have the result of this kernel build, can you find the kernel.debug > file it generated and run 'gdb kernel.debug' and then 'l *0xffffffff809c342e'? > That will (hopefully) identify the exact line it panic'd on. It might also > be useful to do 'x/i 0xffffffff809c342e' in gdb as well. Below are the results of the two gdb commands: (gdb) l *0xffffffff809c342e 0xffffffff809c342e is in cpu_idle_mwait (cpufunc.h:470). 465 } 466 467 static __inline void 468 cpu_monitor(const void *addr, int extensions, int hints) 469 { 470 __asm __volatile("monitor;" 471 : :"a" (addr), "c" (extensions), "d"(hints)); 472 } 473 474 static __inline void (gdb) x/i 0xffffffff809c342e 0xffffffff809c342e <cpu_idle_mwait+62>: monitor %eax,%ecx,%edx Thanks, Steven
On Tuesday, April 29, 2014 10:13:20 pm Steven Spence wrote: > (gdb) x/i 0xffffffff809c342e > 0xffffffff809c342e <cpu_idle_mwait+62>: monitor %eax,%ecx,%edx That's interesting. It's dying on monitor, not hlt. Can you capture the CPU lines from dmesg from a working kernel? I want to see if VMWare is advertising the ability to use monitor via cpuid. Also, try setting 'machdep.idle_mwait=0' at the loader prompt before booting to see if that fixes the panic. -- John Baldwin
On 04/30/2014 10:09 AM, John Baldwin wrote: > Also, try setting 'machdep.idle_mwait=0' at the loader prompt before booting to > see if that fixes the panic. > Here is the requested information: CPU: Quad-Core AMD Opteron(tm) Processor 2384 (2726.06-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0x100f42 Family = 10 Model = 4 Stepping = 2 Features=0x783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2> Features2=0x802009<SSE3,MON,CX16,POPCNT> AMD Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!> AMD Features2=0x37e9<LAHF,ExtAPIC,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT> TSC: P-state invariant Setting 'machdep.idle_mwait=0' did fix the panic. It successfully booted into 8.4-STABLE with this option set. I am not sure what (if any) ramifications this option causes but if there are little to none I am fine with sticking this in my /boot/loader.conf and running with it. If you feel there is a deeper/generic problem that still needs to be worked out I can try to provide whatever information you need. Thanks, Steven
On Wednesday, April 30, 2014 12:47:31 pm Steven Spence wrote: > Here is the requested information: > > CPU: Quad-Core AMD Opteron(tm) Processor 2384 (2726.06-MHz K8-class CPU) > Origin = "AuthenticAMD" Id = 0x100f42 Family = 10 Model = 4 > Stepping = 2 > Features=0x783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2> > Features2=0x802009<SSE3,MON,CX16,POPCNT> Looks like it is telling the guest here it is ok to use montior ("MON" feature). > AMD > Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!> > AMD > Features2=0x37e9<LAHF,ExtAPIC,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT> > TSC: P-state invariant > > Setting 'machdep.idle_mwait=0' did fix the panic. It successfully > booted into 8.4-STABLE with this option set. I am not sure what (if > any) ramifications this option causes but if there are little to none I > am fine with sticking this in my /boot/loader.conf and running with it. > If you feel there is a deeper/generic problem that still needs to be > worked out I can try to provide whatever information you need. It should be fine as a workaround. The remaining issues I can see are: 1) Should we disable monitor automatically for VMWare? 2) This should be reported to the VMWare folks as it is ultimately their bug. If they don't support usage of 'monitor' by guest OS's, then they should hide it from the cpuid information. Would you be able to handle 2)? I would like to see what they say before adventuring too much further down the path of 1). -- John Baldwin
On 04/30/2014 11:17 AM, John Baldwin wrote: > It should be fine as a workaround. The remaining issues I can see are: > > 1) Should we disable monitor automatically for VMWare? I am not sure on this one. Did FreeBSD start using or change how it was using this feature with kernels > 8.3? Everything worked good up to that kernel version, even with VMWare falsely advertising that it supports the monitor flag. I went looking at the flags the host (CentOS 5) reports for the physical CPU and I don't see the 'monitor' flag in there either so I am not sure where VMWare is getting the idea it is supported. > 2) This should be reported to the VMWare folks as it is ultimately their > bug. If they don't support usage of 'monitor' by guest OS's, then they > should hide it from the cpuid information. > > Would you be able to handle 2)? I would like to see what they say before > adventuring too much further down the path of 1). I don't mind contacting VMWare about it but I am almost positive they are going to tell me that is not a product they support any more and that I should upgrade to ESX, vSphere, or whatever their latest incarnation is. Newer FreeBSDs appear to work with newer VMWare products as I didn't run across anyone else having this problem when I first went searching for a solution. I don't think disabling a feature that appears to work for others just because of some old corner case is a good idea. Doubly so since there is an option to bypass the problem for people with older VMWare installs like mine. Let me know if you still think contacting VMWare is worth pursuing. This is just probably the kick in the butt I need to convert the VMs to Virtualbox or something more recent and supported. Thanks, Steven
On Wednesday, April 30, 2014 1:58:35 pm Steven Spence wrote: > I am not sure on this one. Did FreeBSD start using or change how it was > using this feature with kernels > 8.3? Everything worked good up to > that kernel version, even with VMWare falsely advertising that it > supports the monitor flag. I went looking at the flags the host (CentOS > 5) reports for the physical CPU and I don't see the 'monitor' flag in > there either so I am not sure where VMWare is getting the idea it is > supported. I think most CPUs support monitor nowadays. It was added in the Pentium III IIRC. I think FreeBSD did not use it by default in 8.3 and earlier. > I don't mind contacting VMWare about it but I am almost positive they > are going to tell me that is not a product they support any more and > that I should upgrade to ESX, vSphere, or whatever their latest > incarnation is. Newer FreeBSDs appear to work with newer VMWare > products as I didn't run across anyone else having this problem when I > first went searching for a solution. I don't think disabling a feature > that appears to work for others just because of some old corner case is > a good idea. Doubly so since there is an option to bypass the problem > for people with older VMWare installs like mine. Let me know if you > still think contacting VMWare is worth pursuing. Ahhh, ok. So it sounds like it's probably a bug that they might have already fixed. I think in that case I agree that it's probably best to document this in the PR so Google searches can find the workaround. :) -- John Baldwin
The issue appears to be a bug in older versions of VMWare that is fixed in newer versions.