When running FreeBSD under Xen as a DomU guest - a PVHVM based FreeBSD machine cannot route traffic for any other PV based DomU guests on the same Xen Dom0.
To fix the problem either:
- Replace the DomU router machine with a Linux guest (not ideal!)
- Drop the DomU router machine into HVM mode (i.e. xn0 etc. get replaced by rl0 et'al)
- Drop the other DomU guests from PV/PVHVM mode down to HVM mode (this also appears to fix the problem!)
- Move the DomU router machine to a different XenServer, even if it's in the same pool (problem only happens if the DomU router machine, and the DomU guest trying to use it as a gateway are on the same physical Xen Dom0 host).
None of these solutions are ideal - it's basically precluding you from running a 'gateway' machine on XenServer unless it's either cited on it's own pool - or not efficient (i.e. HVM mode only) - which in turn makes it non-agile.
How-To-Repeat: Install XenServer 6.2.
Install FreeBSD 9.2 / 10.0 as a DomU guest, using the PVHVM (so you end up with a NIC called 'xn0' etc.)
Set this first machine up with (for example) 'gateway_enable="YES"' etc. and configure it to route or NAT traffic to the Internet.
Install another DomU guest (e.g. FreeBSD again, or Windows) on the same XenServer.
Make the default gateway of the 2nd DomU the IP of the first DomU.
Even though the fist DomU machine can fetch data/route traffic to/from "The Internet" - the second DomU machine cannot use it as a gateway. Pings will work, TCP sessions will initially 'connect' but cannot exchange any traffic.
If you replace the 'router' DomU machine with say a Linux box (or Windows box) it works as expected. Only FreeBSD in PVHVM mode does not work as the gateway.
Over to maintainer(s).
Having setup a test system with FreeBSD 9.2-STABLE, 10-STABLE, 11-CURRENT etc. this bug still exists on all of those, regardless of version.
For a 'Client' (i.e. a guest VM trying to route traffic through the other FreeBSD 'router' machine) you can do:
ifconfig xn0 -txcsum
And it will fix that single client. No amount of option fiddling (other than restarting in HVM mode) will fix the 'router' machine - i.e. it's not possible to fix the 'router' machine so that clients don't need any fix.
I've been unable to test disabling txcsum on Windows clients running on the same XenServer as I can't see where I can do that.
I just re-tested this with:
- XenServer 6.5
- FreeBSD 10.1 amd64
Installing FreeBSD in PVHVM mode (i.e. with 'xn0' NIC etc.) - and the problem still exists (incase anyone else runs into it) - there's been a least a couple of other people run into this issue setting up VM's for routing etc.
I use an HVM based VM in the rootbsd cloud. Recently on -current I had to disable rxcsum and txcsum on my vm interfaces to make it "happy" with PF
ifconfig_xn0="inet XXX.XXX.XXX.XXX netmask 0xfffffffc -rxcsum -txcsum"
Maybe try that?
Issues with checksums on XENHVM kernels and the ability to route traffic between XENHVM guests are separate. RootBSD appears to use Cisco switches - at least if the MAC address of the gateway for my RootBSD guest is to be believed. You wouldn't run the gateway for an entire cloud infrastructure off a FreeBSD VM regardless.
For the record, I've been using the OSS Xen releases for years, and have never been able to get PVM (XENHVM) domU to be functional as a gateway - I've either had to use HVM or setup a separate box as the router. This has been the case since at least FBSD8 I think, or whenever XENHVM became an option.
(In reply to Sean Bruno from comment #4)
Following Sean's idea, I was playing with the PV network frontend options, and got one FreeBSD 10.1 RELEASE to get its traffic routed by another FreeBSD 10.1 RELEASE, both within the same XenServer 6.5 host and both with xn1 over the same host-VLAN.
router0# ifconfig xn1 -txcsum -tso4 -lro
vm0# ifconfig xn1 -txcsum -tso4 -lro
Without this config on both domUs, I can not do a:
# fetch http://www.google.com/
(obs.: on success, result will be stored in fetch.out;
needs two control-C to stop)
But with the indicated configuration, I can exec the above command like a charm. I could even SSH into the vm0 from the Internet.
(In reply to raitech from comment #6)
I've tested those options here - and they do work *for FreeBSD* boxes.
However - if you set '-txcsum -tso4 -lro' on the FreeBSD box acting as a router - Windows machines still cannot pass traffic through it as a router :( [not tested Linux - but I'd guess from past experience a Linux PV instance will be the same]
So whilst this is a work around (of sorts) for FreeBSD boxes using another FreeBSD box as a router - it's not usable in mixed platforms.
There's obviously some weird interaction still going on with PV to PV network traffic involving FreeBSD when it's 'routing' things.
I would really like to reproduce this, but sadly my FreeBSD network knowledge is very limited, so please bear with me. When you say:
"Set this first machine up with (for example) 'gateway_enable="YES"' etc. and configure it to route or NAT traffic to the Internet."
Can you please provide examples about how to route NAT traffic to the Internet? A very simple (reduced) use-case that can be used to reproduce this issue would help me a lot.
In its most basic form, without doing nat, without have pf loaded, the problem comes up e.g. in the following setup:
(PC1) 10.0.1.2/24 on xn0 <--> 10.0.1.1/24 on xn0 (FreeBSD DomU Router) 10.0.2.1/24 on xn1 <--> (PC2) 10.0.2.2/24 on xn0.
- Install FreeBSD on DomU Router with 2 paravirtual network interfaces in different subnets
- sysctl net.inet.ip.forwarding=1 on Router
- Setup PC1/2 with FreeBSD/Linux/Windows with the FreeBSD Router as their Gateway.
- On PC1: dd if=/dev/zero bs=1M | nc -l 5001
- On PC2: nc 22.214.171.124 5001 | dd of=/dev/zero bs=1M
- Or any other TCP-Connection, e.g. ssh, for that matter.
(In reply to Sydney Meyer from comment #9)
Just as a side note, I haven't been able to reproduce this using a FreeBSD Dom0, I will now try with a Linux Dom0 (I guess Linux is more picky about checksums?).
It is a problem with DomUs, not Dom0.
(In reply to Andreas Pflug from comment #11)
I know, I'm just pointing out that this doesn't happen with a FreeBSD Dom0 running FreeBSD DomUs.
I guess you are always using a Linux Dom0?
Yes, only Linux Dom0.
I've only seen one anomaly with offloading, which was fixed in Windows PVM drivers a long time ago, so I assume that there's something in FreeBSD Dom0 that's disguising the bug.
I've recently committed a bunch of netfront fixes that I think should help solve this issue. ATM, the only reliable way to do packet forwarding on a FreeBSD DomU is to disable all the hardware offload features on both nics (rxcsum, txcsum, tso and lro). Could someone give it a try?
I'm also working on making forwarding work _without_ having to disable all those features, so that we can get optimal performance, however those patches have not yet been reviewed:
(In reply to Roger Pau MonnÃ© from comment #14)
is this a typo or are there actually three patches?
(In reply to Sydney Meyer from comment #15)
To clarify this, the current code in HEAD should work when doing packet forwarding if rxcsum, txcsum, tso and lro are disabled.
Then, the 3 patches that you mention should allow packet forwarding to work _without_ disabling those features, and yes, you need all 3.
(In reply to Roger Pau MonnÃ© from comment #16)
I've tested these patches now. I installed a HEAD snapshot, and updated it to r301543M.
The VM can route traffic with '-rxcsum -txcsum -tso -lro' applied to xn0 / xn1.
I applied the patches - rebuilt / installed the kernel / rebooted and sadly it still cannot forward traffic - unless I again set the above ifconfig options on both interfaces.
The patches only affect the kernel right - I don't need to buildworld again?
I don't know if i missed something, but i see only two diffs on phabricator, D6656 and D6612, where the last one is linked two times.
(In reply to Sydney Meyer from comment #18)
Sorry, I've failed to post the three links, and the also failed to understand what you were trying to tell me. Here are the proper links:
Karl, could you please try again with all three patches applied? (a buildworld is indeed not needed, only the kernel needs to be updated).
(In reply to Roger Pau MonnÃ© from comment #19)
I applied the previously missing d6611 patch (so I have d6611, d6612, d6656 applied) - but it still doesn't work :(
If I live migrate the test VM router FreeBSD domU to another node in the pool - it starts working.
Migrate it back to the same node as the VM's I'm using to test traffic through it - and it fails again.
As before I can 'ping' through the FreeBSD VM domU - and 'telnet test-website 80' through it connects - but either manually requesting a page, or using 'fetch' - just hangs / fails as before (if the other VM is on the same node).
I can get tcpdumps and stuff (or anything else you need) if you let me know,
(In reply to kpielorz from comment #20)
Yes, I think I know what the issue is. What OS are the other DomUs on the same host using?
If you can provide me with complete tcpdump traces on both interfaces (xn0/xn1), that would help me quite a lot, the following rune should get you the traces:
# tcpdump -n -i <if> -s0 -w <output>.pcap
The resulting pcap files are going to be quite big, so you will probably have to upload them somewhere. Just 10s of capture while trying to route traffic is probably fine.
(In reply to Roger Pau MonnÃ© from comment #21)
pcaps are at (for a short time) http://www.pielorz.com/xn_pcaps.tar.gz
These are xn0.pcap.san and xn1.pcap.san ('.san' because I sanitised them to remove local LAN/switch chatter etc.)
This shows a Windows 7 host on the same XenServer node trying to get to 'news.bbc.co.uk' in Internet Explorer via the FreeBSD domU.
When testing I tried:
Windows 7 <--> FreeBSD Current w/patches <--> Internet
FreeBSD 10.3-R p3 <--> FreeBSD Current w/patches <--> Internet
In both cases moving the FreeBSD Current domU off of the same XenServer node (to a different one in the pool) it works, but not if all the domU's are on the same XenServer node [i.e. as before].
If you need anything else, let me know,
(In reply to kpielorz from comment #22)
Thanks for the traces, I will try to prepare a debug patch for you either this afternoon or tomorrow morning.
FWIW, i have applied the three patches to r301515M and on a dom0 running Xen 4.4.1 with Linux 4.5.1, and a Xen 4.5.3 / NetBSD 7.0.1 host i was able to ping, connect via ssh, scp a file and nc some data between two FreeBSD VMs connected trough a third VM, all running this revision, on the same Linux, respective NetBSD dom0 host.
Could you please apply the following patch on top of the other three and report the results back? It should print some messages on the console (dmesg) when doing packet forwarding.
(In reply to Roger Pau MonnÃ© from comment #25)
I have applied your patch on top of the others and there seems to be no output related to packet forwarding, when doing the above described tests.
Here's the full dmesg output:
Copyright (c) 1992-2016 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.0-ALPHA2 #0 r301752M: Thu Jun 9 23:05:34 CEST 2016
FreeBSD clang version 3.8.0 (tags/RELEASE_380/final 262564) (based on LLVM 3.8.0)
WARNING: WITNESS option enabled, expect reduced performance.
can't re-use a leaf (ixl_rx_miss_bufs)!
VT(vga): text 80x25
XEN: Hypervisor version 4.4 detected.
CPU: Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz (3600.07-MHz K8-class CPU)
Origin="GenuineIntel" Id=0x306c3 Family=0x6 Model=0x3c Stepping=3
Structured Extended Features=0x72a<TSCADJ,BMI1,AVX2,BMI2,ERMS,INVPCID>
Hypervisor: Origin = "XenVMMXenVMM"
real memory = 528482304 (504 MB)
avail memory = 465608704 (444 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: <Xen HVM>
random: unblocking device.
ioapic0: Changing APIC ID to 1
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0 <Version 1.1> irqs 0-47 on motherboard
random: entropy device external interface
kbd1 at kbdmux0
netmap: loaded module
module_register_init: MOD_LOAD (vesa, 0xffffffff80f2eb70, 0) error 19
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
vtvga0: <VT VGA driver> on motherboard
cryptosoft0: <software crypto> on motherboard
acpi0: <Xen> on motherboard
acpi0: Power Button (fixed)
acpi0: Sleep Button (fixed)
cpu0: <ACPI CPU> on acpi0
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 62500000 Hz quality 950
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
Event timer "RTC" frequency 32768 Hz quality 0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <32-bit timer at 3.579545MHz> port 0xb008-0xb00b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
isab0: <PCI-ISA bridge> at device 1.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel PIIX3 WDMA2 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xc400-0xc40f at device 1.1 on pci0
ata0: <ATA channel> at channel 0 on atapci0
ata1: <ATA channel> at channel 1 on atapci0
pci0: <bridge> at device 1.3 (no driver attached)
xenpci0: <Xen Platform Device> port 0xc000-0xc0ff mem 0xf2000000-0xf2ffffff irq 24 at device 2.0 on pci0
vgapci0: <VGA-compatible display> mem 0xf0000000-0xf1ffffff,0xf30d0000-0xf30d0fff at device 3.0 on pci0
vgapci0: Boot video device
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: model IntelliMouse Explorer, device ID 4
fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: does not respond
device_attach: fdc0 attach returned 6
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: console (9600,n,8,1)
xenpv0: <Xen PV bus> on motherboard
granttable0: <Xen Grant-table Device> on xenpv0
xen_et0: <Xen PV Clock> on xenpv0
Event timer "XENTIMER" frequency 1000000000 Hz quality 950
Timecounter "XENTIMER" frequency 1000000000 Hz quality 950
xenstore0: <XenStore> on xenpv0
evtchn0: <Xen event channel user-space device> on xenpv0
privcmd0: <Xen privileged interface user-space device> on xenpv0
debug0: <Xen debug handler> on xenpv0
orm0: <ISA Option ROM> at iomem 0xef000-0xeffff on isa0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
fdc0: No FDOUT register!
ppc0: cannot reserve I/O port range
Timecounters tick every 1.000 msec
xenballoon0: <Xen Balloon Device> on xenstore0
xctrl0: <Xen Control Device> on xenstore0
xs_dev0: <Xenstore user-space device> on xenstore0
xenbusb_front0: <Xen Frontend Devices> on xenstore0
xenbusb_add_device: Device device/suspend/event-channel ignored. State 6
xn0: <Virtual Network Interface> at device/vif/0 on xenbusb_front0
xn0: Ethernet address: 00:16:3e:f1:04:bd
xn1: <Virtual Network Interface> at device/vif/1 on xenbusb_front0
xn1: Ethernet address: 00:16:3e:f1:14:bd
xn0: backend features: feature-sg feature-gso-tcp4
xn2: <Virtual Network Interface> at device/vif/2 on xenbusb_front0
xn1: backend features: feature-sg feature-gso-tcp4
xn2: Ethernet address: 00:16:3e:f1:24:bd
xenbusb_back0: <Xen Backend Devices> on xenstore0
xn2: backend features: feature-sg feature-gso-tcp4
xbd0: 3072MB <Virtual Block Device> at device/vbd/51712 on xenbusb_front0
xbd0: features: flush, write_barrier
xbd0: synchronize cache commands enabled.
taskqgroup_adjust failed cnt: 1 stride: 1 mp_ncpus: 1 smp_started: 0
taskqgroup_adjust failed cnt: 1 stride: 1 mp_ncpus: 1 smp_started: 0
Timecounter "TSC-low" frequency 1800037036 Hz quality 800
WARNING: WITNESS option enabled, expect reduced performance.
Trying to mount root from ufs:/dev/xbd0p2 [rw]...
xn0: link state changed to DOWN
xn0: link state changed to UP
xn1: link state changed to DOWN
xn1: link state changed to UP
xn2: link state changed to DOWN
xn2: link state changed to UP
(In reply to Sydney Meyer from comment #26)
Thanks, since in your case the patches seem to solve the issue, I'm waiting for the feedback from Karl with the debug patch applied.
(In reply to Roger Pau MonnÃ© from comment #27)
I've applied the patch and rebuilt/re-installed the kernel. Sadly I too get no output to the console / dmesg or /var/log/messages while packets are being forwarded. If I move the VM to another node (so it works) - I get nothing logged, and if I move it back to the same node as the other test VM's - it stops working, and I still get nothing logged - sorry!
(In reply to kpielorz from comment #28)
Can you provide a little more info about your router configuration? In my tests I've only enabled net.inet.ip.forwarding=1 in the VM acting as a router and configured different subnets in two interfaces, and exchanged packets between them. Are you doing NAT/Filtering/...?
(In reply to Roger Pau MonnÃ© from comment #29)
This machine is running natd [this was briefly mentioned in the original ticket way-back-when] - so in /etc/rc.conf I have:
ifconfig_xn1="DHCP" // This is our 'Internet' feed via ADSL modem
ifconfig_xn0="inet x.x.x.x netmask 255.255.255.0" // Local LAN IP
'ipfw show' gives:
00050 1226 417933 divert 8668 ip4 from any to any via xn1
65000 7324 1573607 allow ip from any to any
65535 0 0 deny ip from any to any
I can try and setup a non natd case again - but that will involve quite a few changes, as I have no easily routed networks at our office.
(In reply to kpielorz from comment #30)
Oh right, this is kind of different from my test setup, it could explain why it works in my case but not in yours. Do you see any checksum errors? 'netstat -s -f inet' should tell you if there are any checksum errors, at least from a FreeBSD point of view.
It would also be interesting to do the same on the Dom0 itself, but I'm not aware of the rune to obtain that information from Linux.
(In reply to Roger Pau MonnÃ© from comment #31)
I can't see any obvious checksum errors recorded, and like yourself - I don't know how to get similar information from dom0
For bugs that match the following
- Status Is In progress
- Untouched since 2018-01-01.
- Affects Base System OR Documentation
Reset to open status.
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
(In reply to Eitan Adler from comment #33)
Hi - this issue still exists, I've just re-tested in 10.4 and 11.1. I'm not able to test 12.x at the moment, but I have no reason to believe it's been fixed in current or anything.
It affects anything working with 'low level' packets - so NAT, OpenVPN, DHCP et'al. - e.g. with OpenVPN it seems packet coalescing 'behind the scenes' ends up presenting way over 1500 byte packets to OpenVPN - which it point blank refuses to handle.
Workaround we're using here is to set 'hw.xen.disable_pv_nics=1' in /boot/loader.conf on FreeBSD with a small mod to 'qemu-dm-wrapper' on the Xen Server, and a custom field added to affected VM's in XenCenter that the wrapper 'keys off' - this turns xn0 into vtnet0 for these hosts - these do work with the above applications, and are still live-migratable (and appear to be better performance than re0 NIC's).
Does this still happen if you disable LRO/TSO? (packets with size > 1500)
(In reply to Roger Pau Monné from comment #35)
Disabling LRO/TSO doesn't make any difference - I think we'd tried that previously as a possible fix.
(In reply to karl from comment #36)
Yes, I assumed so. I'm currently quite busy, so I don't think I will have time to look into this ATM.
One thing I remember about reproducing this issue is that it takes a non trivial amount of time to setup a way to reproduce it (last time I tried I had to setup a forwarding VM).
Could you perhaps document the faster way to reproduce it, and the one that likely involves less setup? That would be helpful (for me at least) and maybe others that will look into the issue.