Bug 221220

Summary: panic when running as PVHVM under Xen with 4 cores and 4+ network interfaces
Product: Base System Reporter: John <john>
Component: kernAssignee: freebsd-xen mailing list <xen>
Status: New ---    
Severity: Affects Many People CC: royger
Priority: ---    
Version: 11.0-RELEASE   
Hardware: amd64   
OS: Any   
Attachments:
Description Flags
serial console log
none
textdump.tar from /var/crash
none
minfree from /var/crash
none
info.0 from /var/crash
none
bounds from /var/crash
none
xen dmesg
none
XenStore output none

Description John 2017-08-04 13:16:47 UTC
Created attachment 185015 [details]
serial console log

When running FreeBSD 11.0 in a Xen virtual machine with PVHVM support on, 4 cpu cores and 4+ VIF network interfaces, the system always kernel panics. Using less cpu cores or less interfaces never panics. (using 'always', and 'never' in the context of this bug)

Attached is a boot log with 5 virtual interfaces and 4 CPU cores, at db> I typed bt as that's about as much I can do regarding FreeBSD kernel debugging.
Comment 1 John 2017-08-04 14:36:00 UTC
Created attachment 185017 [details]
textdump.tar from /var/crash
Comment 2 John 2017-08-04 14:36:27 UTC
Created attachment 185018 [details]
minfree from /var/crash
Comment 3 John 2017-08-04 14:36:47 UTC
Created attachment 185019 [details]
info.0 from /var/crash
Comment 4 John 2017-08-04 14:37:06 UTC
Created attachment 185020 [details]
bounds from /var/crash
Comment 5 Roger Pau Monné freebsd_committer 2017-08-07 11:50:23 UTC
I've been trying to reproduce this, but so far I'm unable to do so. Have you tried if removing bios=ovmf solves the problem?

Also, do you know anything about the underlying host? Number of physical CPUs and number of vCPUs assigned to Dom0?
Comment 6 Roger Pau Monné freebsd_committer 2017-08-07 11:52:44 UTC
Also, if you have access to Dom0 can you paste the output of `xl dmesg`? (would be good if this was done on a hypervisor built with debug=y)
Comment 7 John 2017-08-07 12:14:12 UTC
The VM is booting in EFI mode so removing ovmf wouldn't work. I haven't tried this issue in BIOS mode, so it may be EFI-only. I can access the dom0, but it's not built with debug=y. I do have serial console, so I can access the Xen debug menu. I have attached the xl dmesg.

Regarding the underlying hardware: it's a Dell R730xd with 2x E5-2630 v4 (total of 40 threads IIRC). It is a test setup with nothing else running on it, only the dom0 and FreeBSD 11.0 (pfSense 2.4). There is no CPU or vCPU pinning. The reason I'm using OVMF is to have the EFI console available over the virtual serial console (as well as the bootloader and of course the first tty). Underlying disk is a ZVOL for the domU, there is 192GB RAM, so it shouldn't be resource constrained.
Comment 8 John 2017-08-07 12:14:32 UTC
Created attachment 185126 [details]
xen dmesg
Comment 9 Roger Pau Monné freebsd_committer 2017-08-07 13:00:51 UTC
Is it a FreeBSD Dom0 or a Linux Dom0?

Can you paste the output of `xenstore-ls -fp` when the DomU panics?

Thanks, Roger.
Comment 10 John 2017-08-07 13:22:06 UTC
(In reply to Roger Pau Monné from comment #9)

It is a Linux Dom0 (Debian 9). I'll have it panic and get the xenstore.
Comment 11 John 2017-08-07 13:34:01 UTC
Created attachment 185128 [details]
XenStore output
Comment 12 Roger Pau Monné freebsd_committer 2017-08-07 14:20:44 UTC
Can you reproduce the same issue using one of the vanilla FreeBSD images?

I've tried to reproduce this with both the upstream FreeBSD images and the pfSense install iso, and so far I'm unable to reproduce it.

Can you also paste the output of `xl info` in Dom0?

Thanks, Roger.
Comment 13 John 2017-08-07 15:06:38 UTC
I can't boot pfSense 2.3 in UEFI mode, that's why I'm using their 2.4 beta.
XL Info:

host                   : xen-1-prod
release                : 4.9.0-3-amd64
version                : #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26)
machine                : x86_64
nr_cpus                : 40
max_cpu_id             : 191
nr_nodes               : 2
cores_per_socket       : 10
threads_per_core       : 2
cpu_mhz                : 2200
hw_caps                : b7ebfbff:77fef3ff:2c100800:00000121:00000001:001cbfbb:00000000:00000100
virt_caps              : hvm hvm_directio
total_memory           : 130976
free_memory            : 2045
sharing_freed_memory   : 0
sharing_used_memory    : 0
outstanding_claims     : 0
free_cpus              : 0
xen_major              : 4
xen_minor              : 8
xen_extra              : .1
xen_version            : 4.8.1
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          : 
xen_commandline        : console=vga,com1 com1=115200 loglvl=all
cc_compiler            : gcc (Debian 6.3.0-18) 6.3.0 20170516
cc_compile_by          : ian.jackson
cc_compile_domain      : eu.citrix.com
cc_compile_date        : Sat Jul 29 13:47:03 CEST 2017
build_id               : db76baddf8aca42d05cff58718878d37
xend_config_format     : 4
Comment 14 Roger Pau Monné freebsd_committer 2017-08-07 15:38:33 UTC
Can you try of the same happens with a plain vanilla FreeBSD 11.0 image?

You can get them from:

ftp://ftp.nl.freebsd.org/pub/FreeBSD/releases/VM-IMAGES/11.0-RELEASE/amd64/Latest/
Comment 15 Roger Pau Monné freebsd_committer 2017-08-08 08:12:26 UTC
I've installed pfSense-CE-2.4.0-BETA-amd64 on an OVMF VM with 2GB of RAM, 4vpcus and 5 network interfaces, and still unable to reproduce.

Can you get a hypervisor with debug enabled? You can pick the sources of the debian package and rebuild the hypervisor only with debug=y. Without that I'm afraid it's going to be quite difficult to figure out what's going on.

Thanks, Roger.
Comment 16 Roger Pau Monné freebsd_committer 2017-08-15 10:47:19 UTC
Is there any news on the issue?
Comment 17 John 2017-08-28 12:02:26 UTC
Sorry for the delay, the issue still exists. I reproduced it with https://download.freebsd.org/ftp/releases/VM-IMAGES/11.1-RELEASE/amd64/Latest/FreeBSD-11.1-RELEASE-amd64.raw.xz but I still have no idea how to debug this.
Comment 18 John 2017-08-28 12:18:04 UTC
Oh no, strike that. I booted that one without OVMF and the problem was a different one (it as the Xen-pf bug where all network access is dropped when checksum offloading is on). I jumped to conclusions because it wouldn't come up once it was booted, but later on I connected to the VNC console of the VM and it was there.

I'll try a clean install from the installer disk to a virtual drive. I'll see if I can get a Xen debug build as well.