When trying to get a DHCP Lease with the Xen-PV Network Device xn0, DHCP Offers from the DHCP Server are ignored and the following Error Message is shown: [root@hostname ~]# dhclient xn0 DHCPDISCOVER on xn0 to 255.255.255.255 port 67 interval 3 DHCPDISCOVER on xn0 to 255.255.255.255 port 67 interval 3 DHCPDISCOVER on xn0 to 255.255.255.255 port 67 interval 6 DHCPDISCOVER on xn0 to 255.255.255.255 port 67 interval 14 DHCPDISCOVER on xn0 to 255.255.255.255 port 67 interval 14 5 bad udp checksums in 5 packets DHCPDISCOVER on xn0 to 255.255.255.255 port 67 interval 20 No DHCPOFFERS received. No working leases in persistent database - sleeping. When taking a tcpdump while dhclient is running, the DHCP Offers from the DHCP Server are visible, but it seems dhclient is ignoring them, as the udp checksum is bad out of whatever Reason. As Google search turned up a lot of Bugs in Linux which were caused by UDP Checksum Offload, i have tried disabling all NIC Features both in the FreeBSD Guest and the dom0, without any success. See tcpdump Output below: 14:07:26.709519 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328) 0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 00:16:3e:19:5e:75, length 300, xid 0x738d8779, secs 4, Flags [none] (0x0000) Client-Ethernet-Address 00:16:3e:19:5e:75 Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message Option 53, length 1: Discover Client-ID Option 61, length 7: ether 00:16:3e:19:5e:75 Hostname Option 12, length 7: "hostname" Parameter-Request Option 55, length 9: Subnet-Mask, BR, Time-Zone, Classless-Static-Route Default-Gateway, Domain-Name, Domain-Name-Server, Hostname Option 119 END Option 255, length 0 PAD Option 0, length 0, occurs 27 14:07:26.710162 IP (tos 0xc0, ttl 64, id 13977, offset 0, flags [none], proto UDP (17), length 345) 10.1.0.1.67 > 10.1.0.15.68: [bad udp cksum 0x1568 -> 0x37a9!] BOOTP/DHCP, Reply, length 317, xid 0x738d8779, secs 4, Flags [none] (0x0000) Your-IP 10.1.0.15 Server-IP 10.1.0.1 Client-Ethernet-Address 00:16:3e:19:5e:75 Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message Option 53, length 1: Offer Server-ID Option 54, length 4: 10.1.0.1 Lease-Time Option 51, length 4: 43200 RN Option 58, length 4: 21600 RB Option 59, length 4: 37800 Subnet-Mask Option 1, length 4: 255.255.255.0 BR Option 28, length 4: 10.1.0.255 Default-Gateway Option 3, length 4: 10.1.0.1 Domain-Name-Server Option 6, length 4: 10.1.0.1 Domain-Name Option 15, length 14: "localdomain" Hostname Option 12, length 7: "hostname" END Option 255, length 0 How-To-Repeat: Execute a dhclient xn0 on the Guest.
Responsible Changed From-To: freebsd-amd64->freebsd-xen reclassify.
Hello, Could you provide more info about the Dom0 kernel version? Roger.
Hi Roger, The Dom0 is running Arch Linux with the following Kernel: Linux central 3.14.1-1-ARCH #1 SMP PREEMPT Mon Apr 14 20:40:47 CEST 2014 = x86_64 GNU/Linux The Xen Version is 4.4.0. The DHCP Service is delivered by another DomU, running Arch Linux with = the same Kernel and dnsmasq 2.69. Regards, Thomas Am 26.04.2014 um 09:15 schrieb Roger Pau Monn=E9 <roger.pau@citrix.com>: > Hello, >=20 > Could you provide more info about the Dom0 kernel version? >=20 > Roger.
Could you try to move the DNS server to another host? There have been reports of failures on if_xn when sending/receiving data to/from other guests on the same host. I know this is not a valid log-term solution, but it could help in identifying the problem, which is probably related to PR 188261. Roger.
Hi Roger, I have just tried. When using a different Host for DHCP, everything seems to work fine. Any Ideas why this might be? If the problem is related to PR 188261, is there any way that i might = help narrow the problem down? Thomas Am 26.04.2014 um 10:35 schrieb Roger Pau Monn=E9 <roger.pau@citrix.com>: > Could you try to move the DNS server to another host? There have been > reports of failures on if_xn when sending/receiving data to/from other > guests on the same host. >=20 > I know this is not a valid log-term solution, but it could help in > identifying the problem, which is probably related to PR 188261. >=20 > Roger.
Created attachment 180307 [details] [PATCH] dhclient: skip UDP checksum check when running in a Xen VM Proposed patch.
I'm hitting the same issue, as are other colleagues who have tried various versions of FreeBSD from 8.4 to 10.0 to 11.0 as a domU guest on Xen. The issue has to do with checksum offloaded packets arriving from a DHCP server running in dom-0 on Xen, and dhclient getting fooled by those. I'm attaching a proposed patch that fixes this issue. I've tested this in a FreeBSD 11.0 VM on Xen 4.4.4 with a Linux 4.1.12-based dom-0 kernel.
Comment on attachment 180307 [details] [PATCH] dhclient: skip UDP checksum check when running in a Xen VM This should be fixed in netfront, not hacked like this in every tool that can be run on Xen.
(In reply to Bhavesh Davda from comment #7) Can you try to disable txcsum and rxcsum inside of the guest and see if that solves the issue? (this will only work on FreeBSD 11.0).
Created attachment 180335 [details] Fix netfront inboud CSUM offload flags Can you please also try the following patch and report back? Thanks, Roger.
(In reply to Roger Pau Monné from comment #9) Yes, in a FreeBSD 11.0 VM, I had already verified this workaround works: in /etc/rc.conf: ifconfig_XN0="DHCP -rxcsum" Note that you only need to disable Rx checksum offload, as the issue is with UDP BOOTP packets received with an offloaded but "invalid" UDP checksum. However, I expect this workaround to have an unnecessary performance impact on receive packet throughput over xn0; basically you're using a 'big hammer' to completely disable an offload only because one application (dhclient) can't deal with a couple of UDP packets. More comments in a separate update...
(In reply to Roger Pau Monné from comment #8) Yes, I know this change to dhclient seems unfortunately 'hackish' but is necessary because dhcilent relies on an interface [bpf(4)] to send and receive packets, which doesn't have a way to indicate if certain packet processing (e.g. UDP checksum validation) should be skipped because it's been offloaded to the NIC. Besides, if you look at upstream mainline ISC DHCP, there is similar code for Linux guests as well: My proposed patch took the approach of making the bare minimal fix to the mainline FreeBSD dhclient to achieve the same end result. From: Thomas Markwalder <tmark@isc.org> Date: Thu, 18 Sep 2014 15:25:02 -0400 Subject: [PATCH 1/1] [master] Checkum handling fixes Merges in rt22806 ... --- a/RELNOTES +++ b/RELNOTES @@ -95,6 +95,25 @@ by Eric Young (eay@cryptsoft.com). are within the declared. Thanks to Jiri Popelka at Red Hat for the bug report and patch. [ISC-Bugs #32453] + [ISC-Bugs #17766] + [ISC-Bugs #18510] + [ISC-Bugs #23698] + [ISC-Bugs #28883] + +- Addressed checksum issues: + Added checksum readiness check to Linux packet filtering which eliminates + invalid packet drops due to checksum errors when checksum offloading is + in use. Based on dhcp-4.2.2-xen-checksum.patch made to the Fedora project. + [ISC-Bugs #22806] + [ISC-Bugs #15902] + [ISC-Bugs #17739] + [ISC-Bugs #18010] + [ISC-Bugs #22556] + [ISC-Bugs #29769] + Inbound packets with UPD checksums of 0xffff now validate correctly rather + than being dropped. + [ISC-Bus #24216] + [ISC-Bus #25587]
(In reply to Bhavesh Davda from comment #12) IMHO, then the correct fix is to accept 0xffff as a valid checksum (which is the value set by netfront and other drivers when the checksum is offloaded). Or else fix bpf to generate the correct checksum for offloaded packets that are passed on that interface if there's no way to tell the users if the checksum has been offloaded or not. Can you give a spin to the patch I've attached also? Thanks, Roger.
(In reply to Roger Pau Monné from comment #8) I looked at the proposed patch to the netfront driver, and think this introduces a semantic mismatch between the meaning of the 'NETRXF_data_validated' rx->flag between the netback driver in dom-0 and the netfront driver in the FreeBSD dom-U. Basically 'NETRXF_data_validated' says (from the equivalent CHECKSUM_UNNECESSARY Linux skbuff flag that the Linux dom-0 netback driver uses for this): * CHECKSUM_UNNECESSARY: * * The hardware you're dealing with doesn't calculate the full checksum * (as in CHECKSUM_COMPLETE), but it does parse headers and verify checksums * for specific protocols. For such packets it will set CHECKSUM_UNNECESSARY * if their checksums are okay.
(In reply to Bhavesh Davda from comment #14) And the mbuf(9) man page says: "If a particular network interface just indicates success or failure of TCP or UDP checksum validation without returning the exact value of the checksum to the host CPU, its driver can mark CSUM_DATA_VALID and CSUM_PSEUDO_HDR in csum_flags, and set csum_data to 0xFFFF hexadecimal to indicate a valid checksum." I think that's what maps best to NETRXF_data_validated/CHECKSUM_UNNECESSARY (in fact it's the only combination of outbound flags that make sense for the use case here AFAICT), but I'm not a network expert. I think setting CSUM_IP_CHECKED and CSUM_IP_VALID in the netfront driver for incoming packets is wrong, because netfront is also setting the header to 0xffff, and that's only valid with (CSUM_DATA_VALID | CSUM_PSEUDO_HDR).
(In reply to Roger Pau Monné from comment #15) Yes, your comment #15 convinces me that returning only CSUM_DATA_VALID and CSUM_PSEUDO_HDR with csum_data set to 0xffff is the "right" way to fix this, along with a corresponding change to dhclient to skip UDP checksum validation if it finds the checksum in the UDP header to be 0xffff. I'm attaching the associated patch required to dhclient. Please review. Unfortunately I'm not a FreeBSD expert to know if there is a quick and easy way to rebuild just the netfront driver in my FreeBSD 11.0 VM with your proposed patch, but hopefully you are, and can test your patch along with my new dhclient patch, at a minimum to verify this doesn't regress anything. Thanks!
Created attachment 180347 [details] dhclient: skip UDP checksum validation if Rx checksum offload in effect
(In reply to Bhavesh Davda from comment #16) Sadly dhclient is not my area of expertise, so we will have to wait for someone to review it. I've created a differential revision based on your patch, and assigned it to the networking people. Hopefully we will get an answer soon: https://reviews.freebsd.org/D9832 Feel free to register, and I can assign the patch to you (and then you will be able to provide further changes if required). Thanks!
Another solution would be adding a mangle rule to the iptables something like # iptables -A POSTROUTING -t mangle -p udp --dport 68 -j CHECKSUM --checksum-fill this approach is used in openstack. https://github.com/openstack/nova/blob/master/nova/network/linux_net.py#L896
(In reply to Roger Pau Monné from comment #18) Roger, thanks a bunch for creating a review for the dhclient change! I don't know the code review process in this community: do you have to wait for a "ship it!" from a mandatory reviewer or something? I've tried to answer the one question posed on the review from Adrian Chadd. Thanks again.
(In reply to Alexander Nusov from comment #19) Hi Alexander, IMHO just like the referenced bug https://bugzilla.redhat.com/show_bug.cgi?id=910619#c6 in that openstack nova function states, it's better to fix this in dhclient (ala the referenced dhcp-4.2.2-xen-checksum.patch) than add a workaround with iptables mangle. While I'm not an openstack nova person and can't do anything about their choice of using this workaround, I'm hoping that for FreeBSD at least we can teach dhclient about checksum offloaded packets.
(In reply to Bhavesh Davda from comment #20) In this case I would prefer so. I don't know much about net, much less about dhclient, so I would like someone that knows to review that patch. Let's give it some time (let's wait until middle of next week), and then I will start putting more pressure if no-one has reviewed it.
Hi Roger, it's been quite some time since that last update and am wondering if this slipped through the cracks. Thanks.
Hello, Maybe the dhclient fix is more appropriate? I'm quite lost, so I would recommend that you create a differential review with what you consider better and add hrs, adrian and the network group as reviewers. Thanks!
(In reply to Roger Pau Monné from comment #24) I spent sometime diagnosing this issue. From the code how kernel process inbound UDP packets and how BPF works, I may conclude this issue is caused by a combination of 1. Dom0 Virtual NIC does not really calculate the checksum of outbound UDP packets ( if TXCSUM is enabled, which cheat the Dom0 kernel not do soft checksum calculatation ), and only some *flags* are set on the internal presentation of packets, but the packets' on-wire checksum field is not filled with correct value. 2. DomU Virtual NIC' RXCSUM does not matter. If the internal presentation of UDP packets have *flags* then those packets are marked valid. If not then it make no difference between re-calculating in driver and kernel. 3. DomU guest (FreeBSD) kernel will check and drop inbound UDP packets with invalid checksum. 4. The BPF was hooked into driver part, it does not care checksums of packets and pass them as is to consumers such as DHCP client. And BPF consumers will have to re-calculate the checksum by themself. 5. Unfortunately BPF consumer (DHCP client) would definitely got wrong checksum and it is not the fault of BPF consumers. So it sound like a design defect of BPF, which does not have a way to pass checksum (offload) information to its consumers. @Roger Pau Monné I bet disabling DomU Virtual NIC' RXCSUM feature will not help. I think it is more a hack to workaround the problem in BPF consumers' code. The right approach should be disable Dom0 Virtual NIC's TXCSUM offload feature (large impact), or using firewall rules (on Dom0 side) to calculate and fill UDP checksum of outbound packets from DHCP server application.
(In reply to Zhenlei Huang from comment #25) Thanks for the analysis. I think there either needs to be a way to signal that the package checksum should be ignored (and thus assume the checksum to be correct), or there must be a point in the software stack where a checksum is calculated for packets that don't have one if further processing requires such checksum. Thanks, Roger.
Hello. What you did to have a full working FreeBSD 10 installation today ? it is went EOL since so many years !
I assume the issue is still present in current FreeBSD versions, as I'm not aware of anyone fixing it. For the specific question about how to get FreeBSD 10, there's an archive with old images: http://ftp-archive.freebsd.org/pub/FreeBSD-Archive/old-releases/VM-IMAGES/10.4-RELEASE/amd64/Latest/
I know about these old images...the problem is that a very old image is not usable...packages can't be installed,ports are broken...how did you fix it ?
(In reply to mario felicioni from comment #29) Can we please keep the bug report in context? This is not the right forum to ask about how to use old images. Please ask on freebsd-hackers or a better suited mailing list of your choice: https://lists.freebsd.org/. Thanks.
Really my question is correlated with xen. As you know I'm working to make xen works on arm and I'm thinking to use an old FreeBSD image,to see if I get the same problems. But I've found the problem that its hard to use an old image. Furthermore,I've launched Linux as domU under xen and actually I'm not able to get the connection on the vm. I don't know if there is some bug or I'm configuring the network badly. I could confirm the presence of some error inside the default script used by xen to give the connectivity to linux as domU.
(In reply to mario felicioni from comment #31) No. This bug report is about a very specific issue with FreeBSD dhclient when running on Xen. Unless you are hitting this same exact issue with dhclient I would please request that you raise any issues as either separate bug-reports, or send them as emails to freebsd-xen mailing list: https://lists.freebsd.org/subscription/freebsd-xen Otherwise you are just adding noise.
This is the script that I use when I boot the debian vm under xen : name="debian" kernel = '/mnt/zroot2/zroot2/OS/Chromebook/linux-xen/domU-linux/zImage-6.1.59-stb-xen-cbe+' memory=512 vcpus=1 autoballon="on" disk = [ 'debian.img,raw,xvda,w' ] vfb = [ 'type=vnc,vnclisten=0.0.0.0,vncdisplay=1' ] vif = [ 'type=vif,mac=00:16:3e:xx:xx:xx,script=vif-route-local,ip=192.168.1.14' ] extra = 'console=hvc0 root=/dev/xvda rw init=/sbin/init xen-fbfront.video=24,1024,768' This is the script that's invoked when I launch the vm : #!/bin/bash #============================================================================ # ${XEN_SCRIPT_DIR}/vif-route # # Script for configuring a vif in routed mode. # # Usage: # vif-route (add|remove|online|offline) # # Environment vars: # dev vif interface name (required). # XENBUS_PATH path to this device's details in the XenStore (required). # # Read from the store: # ip list of IP networks for the vif, space-separated (default given in # this script). #============================================================================ dir=$(dirname "$0") . "${dir}/vif-common.sh" netdev=enx8cae4cd6c871 main_ip=$(dom0_ip) case "${command}" in add|online) echo $dev echo $ip echo $main_ip ifconfig ${dev} ${main_ip} netmask 255.255.255.255 up echo 1 >/proc/sys/net/ipv4/conf/${dev}/proxy_arp echo 1 >/proc/sys/net/ipv4/conf/${dev}/forwarding echo 1 >/proc/sys/net/ipv4/conf/enx8cae4cd6c871/forwarding echo 1 >/proc/sys/net/ipv4/conf/enx8cae4cd6c871/proxy_arp /usr/sbin/arp -i enx8cae4cd6c871 -Ds $ip enx8cae4cd6c871 pub echo "/usr/sbin/arp -i enx8cae4cd6c871 -Ds $ip enx8cae4cd6c871 pub" ipcmd='add' cmdprefix='' ;; remove|offline) do_without_error ifdown ${dev} ipcmd='del' cmdprefix='do_without_error' ;; esac case "${type_if}" in tap) metric=1 ;; vif) metric=2 ;; *) fatal "Unrecognised interface type ${type_if}" ;; esac # If we've been given a list of IP addresses, then add routes from dom0 to # the guest using those addresses. for addr in ${ip} ; do ${cmdprefix} ip route ${ipcmd} ${addr} dev ${dev} src ${main_ip} metric ${metric} done handle_iptable call_hooks vif post log debug "Successful vif-route ${command} for ${dev}." if [ "${command}" = "online" ] then success fi this line : echo "/usr/sbin/arp -i enx8cae4cd6c871 -Ds $ip enx8cae4cd6c871 pub" it seems wrong,but I don't understand how to fix it. $ip should be 192.168.1.14 and it takes it from here : vif = [ 'type=vif,mac=00:16:3e:xx:xx:xx,script=vif-route-local,ip=192.168.1.14' ] so basically it assign to the network interface called "enx8cae4cd6c871" (that's on the host os (devuan) IP 192.168.1.14 ; it's a share that this does not happen. There is something wrong. I think 192.168.1.14 should be assigned to the vif network interface ? If it works like this,this does not haopens,because both the interfaces came out with IP 1.3 : enx8cae4cd6c871: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.3 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::e475:9ea5:4deb:e9e6 prefixlen 64 scopeid 0x20<link> ether 8c:ae:4c:d6:c8:71 txqueuelen 1000 (Ethernet) RX packets 349834 bytes 476244626 (454.1 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 81291 bytes 22926191 (21.8 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 4307 bytes 1590600 (1.5 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 4307 bytes 1590600 (1.5 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 mlan0: flags=-28669<UP,BROADCAST,MULTICAST,DYNAMIC> mtu 1500 ether 06:7e:6c:db:b5:f6 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 vif9.0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.3 netmask 255.255.255.255 broadcast 192.168.1.255 ether fe:ff:ff:ff:ff:ff txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 2 bytes 286 (286.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Sorry if it is not pertinent with this bug. I really don't know if it is or not before to show you the problem. Now I did it and if you tell that's not pertinent,ok,I will stop here.