Bug 168416 - [hang] OS hangs when guest on VMWare ESX
Summary: [hang] OS hangs when guest on VMWare ESX
Status: Closed Not Enough Information
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-emulation (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-05-28 19:40 UTC by Mark Felder
Modified: 2015-03-17 13:38 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Felder freebsd_committer freebsd_triage 2012-05-28 19:40:01 UTC
This problem has been discussed in depth on the freebsd-hackers@ and freebsd-questions@ mailing lists under the subject "Please help me diagnose this crazy VMWare/FreeBSD 8.x crash". Q3 2012 will mark 2 years since we started trying to track down this bug.

Symptoms:
FreeBSD hangs with what appears to be I/O starvation. Anything in memory functions, but any processes that need disk access hang in a blocked state. VMWare's performance info shows that at the time of the issue CPU usage of the VM spikes to 100%. There is no panic even if you leave the VM running for long periods of time. Pausing the VM does not fix it, and trying to migrate it to another host hoping that will kick I/O back to life does not work either. The only recourse is a hard reboot.

Things We Have Done (everything imaginable, really):
- Rebuilt crashing VMs and VM templates from scratch with verified media multiple times, even changing OS versions -- 8.0 through 8.3, i386 and amd64. We have very few 9.0 servers and haven't seen a crash there yet, but Dane Foster confirmed on 9.0.
- Changed every VM setting imaginable -- including undocumented things suggested by VMWare Support
- Ruled out specific software. There is no common denominator so far; just FreeBSD
- Replaced ESX hardware
- Replaced iSCSI switches
- Replaced SANs
- Verified that it crashes on Local Disk -- a SAN is not required for this crash to happen
- Changed ESX versions (can replicate this on ESXi 4.0 - 5.0)


Possibly Valuable Info:
On one machine that started to crash regularly I built a full debugging kernel and managed to drop it into DDB when it finally crashed. Results can be seen here, but not sure of the value: http://feld.me/pub/freebsd/esx_crash/

After further discussion on the lists it was declared that the problem might be interrupts. This helped us to narrow down the issue which we think is on the right track.

A trend I immediately noticed:

VMs that are known to crash share an IRQ with em0 and mpt0:

$ vmstat -i
interrupt                          total       rate
irq1: atkbd0                         378          0
irq6: fdc0                             9          0
irq15: ata1                           34          0
irq16: em1                        687237          1
irq18: em0 mpt0                319094024        539
cpu0: timer                    236770821        400
Total                          556552503        940

VMs that we have never seen crash before are not sharing an IRQ:

$ vmstat -i
interrupt                          total       rate
irq1: atkbd0                          38          0
irq6: fdc0                             9          0
irq15: ata1                           34          0
irq16: em1                          2811         15
irq17: em2                             5          0
cpu0: timer                        71013        398
irq256: mpt0                       12163         68
Total                              86073        483

It was suggested that we could use hint.mpt.0.msi_enable="1" to prevent that behavior and possibly prevent the crash. So far the effectiveness of this is unconfirmed.

Dane Foster can no longer reproduce the crash on demand when he applies the following settings to FreeBSD 8.x (does not work on 9.x) and leaving the em0 NIC unused (disconnected in VMWare -- no LINK; pretends it's unplugged)

hw.pci.enable_msi="0"
hw.pci.enable_msix="0"

His results are as follows:

samael:~:% vmstat -i
interrupt                          total       rate
irq1: atkbd0                           6          0
irq18: em0 mpt0                  3061100         15
irq19: em1                       6891706         35
cpu0: timer                    166383735        868
cpu1: timer                    166382123        868
cpu3: timer                    166382123        868
cpu2: timer                    166382121        868
Total                          675482914       3525



I hope this is enough information. If any other details are required please let me know. I believe both Dane and I are ready and willing to test any suggested workarounds or patches that are made available.

Fix: 

FreeBSD 7.x is unaffected, which is our fix on machines we have declared too valuable to have crash -- reverting to 7.4, or cloning the server off to its own hardware. Unfortunately 7.4 is getting quite old and the support will be gone early next year so a solution is desperately needed.
How-To-Repeat: At this moment I do not have a way to reliably reproduce it with our workload. Some of our VMs can go nearly 90 days without a crash. Some will begin to crash multiple times a week, and then mysteriously stop. It is very unpredictable for us. 

Dane Foster can reproduce it at will with his workload (handbrake video encoding) and we will make VM images available and provide detailed instructions on how to reproduce it.
Comment 1 Mark Felder freebsd_committer freebsd_triage 2012-06-07 19:42:33 UTC
I have wonderful news: we can now reproduce the crash on demand. We  
discovered that if we stress em and mpt at the same time by doing I/O on a  
HAST device, we can easily reproduce this issue.

I also have a coredump I took from breaking into DDB and running "dump"  
and also a picture of the backtrace:

http://feld.me/pub/freebsd/esx_crash/bt.png
http://feld.me/pub/freebsd/esx_crash/vmcore.0.gz


Requirements:

VMWare ESXi 5: RAM 1GB CPUs 1
FreeBSD 9 (-RELEASE, or STABLE... produced this coredump on -STABLE from  
Jun 3rd)
HAST
iozone or bonnie++ (seems that iozone crashes it faster and more  
consistently)

/ 40GB  UFS+SUJ
/dev/hast/hast0 (mounted on /mnt) 8GB   UFS+SUJ
SWAP 2GB (I put my swap on a separate disk as well to help aid getting a  
successful dump.)

So in this environment I have 2 servers (node1 and node2) for proper HAST,  
so it actually does try to transfer changes to the secondary. It's merely  
there to receive the data; it's not otherwise involved in this test.

hast.conf:
# global section
timeout 5
compression hole

resource hast0 {
	on node1 {
		local /dev/da1
		remote 192.168.44.2
	}
	on node2 {
		local /dev/da1
		remote 192.168.44.1
	}
}
	

Kernel config "DEBUG" I used for getting this coredump:

include GENERIC
makeoptions	DEBUG=-g
options	INVARIANTS
options	INVARIANT_SUPPORT
options	WITNESS
options	DEBUG_LOCKS
options	DEBUG_VFS_LOCKS
options	DIAGNOSTIC
options KDB
options DDB
options DDB
options BREAK_TO_DEBUGGER
options ALT_BREAK_TO_DEBUGGER
options KTR
options KTR_ENTRIES=1024
options KTR_COMPILE=(KTR|KTR_PROC)
options KTR_MASK=KTR_SCHED
options KTR_CPUMASK=("0x3")
options KTR_VERBOSE


And the iozone command that works quite consistently (I ran it in a loop  
just in case it wouldn't crash the first time):
iozone -M -e -+u -T -t 8 -r 128k -s 40960 -i 0 -i 1 -i 2 -i 8 -+p 70 -C -F  
/mnt/io.1 /mnt/io.2 /mnt/io.3 /mnt/io.4 /mnt/io.5 /mnt/io.6 /mnt/io.7  
/mnt/io.8

Bonnie++ command we can get to cause the crash sometimes:
bonnie++ -u root -d /mnt/ -s 3552M -n 10:102400:1024:1024

The only other tip I have for you if you want to rebuild this entire  
environment is to change your keybind. You can't break in to the debugger  
on VMWare with CTRL+ALT+ESC because CTRL+ALT drops your focus on the VM.  
You have to override this. I tend to use CTRL+ALT+SHIFT. The following is  
how you change that:

XP: C:\Documents and Settings\USERNAME\Application  
Data\VMware\preferences.ini
Vista/7: C:\users\<username>\appdata\roaming\vmware\preferences.ini

pref.hotkey.shift = "true"
pref.hotkey.control = "true"
pref.hotkey.alt = "true"


Please let me know if there is anything I can do to help aid in resolving  
this issue.
Comment 2 Mark Felder freebsd_committer freebsd_triage 2012-06-07 21:57:22 UTC
I've been informed that without exact kernel used the vmcore.0 is not of  
value. I've updated the vmcore.0.gz file (md5:  
e2d2fb3d3f4601d6a4055a939d547dbd ) and have also uploaded the kernel which  
was built with the same specs as previously defined, but is based on  
RELENG_9_0

http://feld.me/pub/freebsd/esx_crash/vmcore.0.gz
http://feld.me/pub/freebsd/esx_crash/kernel.tar.gz

I guess the backtrace screenshot was useless because it was just from  
entering the DDB. That makes sense because this doesn't actually do a real  
crash/panic. I've provided a couple others that might be useful, though:

http://feld.me/pub/freebsd/esx_crash/chains.png
http://feld.me/pub/freebsd/esx_crash/showthreads.png

Hopefully this will be more useful.
Comment 3 Mark Felder freebsd_committer freebsd_triage 2012-07-24 17:56:11 UTC
Just wanted to follow up with news that I have repeated this crash on  
FreeBSD 10 / HEAD from 7-24-12 as well.
Comment 4 Mark Linimon freebsd_committer freebsd_triage 2015-03-12 04:33:42 UTC
To submitter: is this PR still relevant?
Comment 5 Mark Felder freebsd_committer freebsd_triage 2015-03-17 13:38:44 UTC
I am no longer running a VMWare ESXi environment. I also haven't seen anyone complain about this in a couple years. I suspect VMWare solved this issue in later releases of their ESXi product.

It's probably safe to close.