I have encountered a bug in FreeBSD 10.2 (and also -CURRENT) when using NAT with either pf or ipfw. My setup for the gateway host is:
* Microsoft Client Hyper-V on Windows 10 host machine
* FreeBSD 10.2 Release (no upgrades or updates)
* Two network interfaces, hn0 (the LAN "private switch") and hn1 (the gateway "external switch")
* A simple pf.conf:
nat on hn1 inet from hn0:network to any -> (hn1)
I tried the equivalent for ipfw, i.e., setting firewall_type to "open" and the and nat interface to hn1.
Both configurations work fine in FreeBSD 10.1 Release, using the exact same Hyper-V setup. On FreeBSD 10.2 (and -CURRENT), connections to the Internet from the gateway itself are working, but other VMs forwarding through the gateway from the LAN while using NAT does not work. I have done some basic investigation, including disabling the checksum and TSO offloading options (via ifconfig) that were added to the netsvc driver for 10.2 (in R285236), but that didn't help.
Whatever it is, it is in a common code path shared by pf and ipfw, or perhaps the netsvc driver. In looking around the Internet, I saw a few unanswered posts (which predate 10.2) about pf mysteriously dropping state and TCP connections entering the SYN_SENT:CLOSED state immediately. That is the symptom I see in 10.2. The outbound NAT translation is successful, and tcpdump shows the packets being sent out of the external interface. But then nothing else happens (no response from the server seems to come back), and the state is dropped. This problem is easy for me to reproduce; it happens on any new Hyper-V VM I create with 10.2 Release, and likewise it always works fine with 10.1 Release.
Try to add to loader.conf and reboot
It helps me with similar configuration (10.2) under Windows Server 2012 R2
Those are tunables for the vtnet driver, the virtio-based virtual network driver. Hyper-V has its own (netsvc) and doesn't use the vtnet. In some VM programs like VirtualBox, you have a choice to use virtio if you want it, but I don't see that option in Client Hyper-V. Are you using Hyper-V with vtnet somehow? I have never used Windows Server Hyper-V, but I know if it offers different options than the Hyper-V that comes with Windows Professional Edition, some of which relate to networking. I tried adding the tunable anyway, but it didn't do anything. Do you have a vtnet0 interface?
Sorry for my mistake. You are right - vm that I mean is running on KVM. I have some vm's running on Hyper-V but without nat. I can test it some later.
Somebody else also reported the similar issue on 10.2. Unfortunately I cannot find a way to reproduce it in house. Can you provide a detailed step for me to repro, such as the pf.conf file, and NAT config in detail?
Also are you using Vlan? Thanks.
I encounter the same problem in a slightly different configuration:
- Hyper-V 2012, hosting:
-- FreeBSD 10.2 x64 with MPD acting as a PPTP server
-- Windows 7
- Clients on the LAN (misc OS), inclding an old FreeBSD 6.1
- Clients connected to FreeBSD 10.2 VM by PPTP (MPD5) using misc OS as well
Firewalls turned off.
* TCP, UDP, ICMP work:
- between all PPTP clients
- between a PPTP client, FreeBSD and the Windows 7 virtual machine
- between FreeBSD and local machines on the LAN
* TCP doesn't work between a PPTP client and machines on the LAN.
I tried to investigate by opening a TCP connection on port 80 from a PPTP client to the old FreeBSD 6.1 on the LAN. There is a part of the tcpdump I ran on that old FreeBSD 6.1:
IP (tos 0x0, ttl 127, id 20111, offset 0, flags [DF], proto: TCP (6), length: 48) pc-ed.lan.domain.fr.56026 > srv-mandy.lan.domain.fr.http: S, cksum 0x84d6 (incorrect (-> 0x7c54), 2429810306:2429810306(0) win 8192 <mss 1354,nop,nop,sackOK>
I have read that having an incorrect checksum was normal, so I guess the problem doesn't come from that.
From what I saw, the problem occurs as long as the FreeBSD 10.2 is used as a gateway (NAT or not).
Let me know if I can do other tests to help you investigate the problem. Any help would be appreciated.
Created attachment 162011 [details]
Revert TSO and checksum offloading patch r285236 in Netvsc driver
If you have the test environment and can try something, can you apply the attached patch on the 10.2 server and see if the problem still occurs? The patch is a revert of r285236, which I suspect may be the culprit. But I don't have environment to reproduce.
Everything seems to work with the patch.
This is what I did:
- Create a clean new VM with FreeBSD 10.2 on the Hyper-V server.
- Activated IP forwarding: sysctl net.inet.ip.forarding=1
- On another computer (same LAN, running Windows 10): set the default gateway to the new FreeBSD test VM. Ping/tracert to the internet work. TCP doesn't work.
- Patch netvsc with the r285236 file you provided, in the /usr/src/sys/dev/hyperv/netvsc/ folder (patch -i r285236)
- Rebuild and install the kernel, then reboot.
- TCP works from the LAN machines.
Please note that I couldn't test it in a PPTP or NAT configuration.
Now I wait for the patch to be included in the next FreeBSD update (since I usually don't build custom kernel).
(In reply to Eddy from comment #8)
> This is what I did:
> - Create a clean new VM with FreeBSD 10.2 on the Hyper-V server.
> - Activated IP forwarding: sysctl net.inet.ip.forarding=1
> - On another computer (same LAN, running Windows 10): set the default gateway to the new FreeBSD test VM. Ping/tracert to the internet work. TCP doesn't work.
In above setting, how can pinging from Windows 10 machine to internet work? The machine in the internet doesn't have routing knowledge to send the packet back to Windows 10 client which is inside LAN.
Are you using NAT on the FreeBSD 10.2 server? When I enabled the NAT, everything seems working in on 10.2 as a gateway.
So overall, I think the r285236 is the cause of the problem. However, since I still cannot reproduce and r285236 is a big change, I cannot narrow down to smaller part for sure.
We come up with a suspecting code path. Attached is another patch which you can test for us. Please apply this directly on clean 10.2 code (not on the patch I attached earlier.) This new patch just disabled the checksum offloading. See if this one can help solve the issue you are seeing.
Created attachment 162111 [details]
Only disable checksum offloading on 10.2
I have a separate NAT router between the VM and the Internet, but not on the FreeBSD 10.2 server:
PC-LAN-WIN10 <------> FREEBSD 10.2 VM <------> NAT_ROUTER <------> INTERNET
I added the NAT router as a default route on the FreeBSD test VM before doing the tests:
# route add default 192.168.1.254
I just tried to build a new kernel with the last "disable_csum_20151016.patch" you provided but I am stuck with an error:
/usr/src/sys/dev/hyperv/netvsc/hv_rndis_filter.c:828:11 error: unused variable `dev` [-Werror,-Wunused-variable]
device_t dev = device->device;
(In reply to Eddy from comment #11)
>I just tried to build a new kernel with the last "disable_csum_20151016.patch" you provided but I am stuck with an error:
>/usr/src/sys/dev/hyperv/netvsc/hv_rndis_filter.c:828:11 error: unused variable `dev` [-Werror,-Wunused-variable]
device_t dev = device->device;
You can just comment out this line since this variable is not used after applying the the patch. Let me know how it goes.
(In reply to Wei Hu from comment #12)
After some tests, "disable_csum_20151016.patch" doesn't solve the issue for me. The last r285236 patch worked.
Do I have to first apply the r285236 patch and then the disable_csum_20151016.patch? I only applied the new patch on clean sources.
Created attachment 162763 [details]
Fix a checksum offloading bug in Hyper-V netvsc driver.
Do not calculate TCP checksum when the receiving bits in csum_flags are set.
Sorry for the late response. We still cannot reproduce the issue, but another customer reported the same issue and found a bug in the Hyper-V checksum path. Attached is a patch to fix this issue. Please apply on a clean 10.2 kernel, rebuild and see if this fixes the problem you are seeing. Let us know if this works or not.
Hi and thank you for the patch.
I just applied it on a clean 10.2 kernel and made the same tests as before.
It seems to solve the issue!
I wait for it to be officially included in the next official updates.
The fix went into Head as r291156. I will merge to 10 stable branch in a week.
Is the patch merged to stable branch?
(In reply to Eddy from comment #18)
Not yet. I will try to do it this week.
(In reply to Eddy from comment #18)
@Eddy / Wei
When commits logs contain a line containing "PR: <issueid>[, <issueid>], it will be referenced as a comment against those issue id's.
Also, committers will set the mfc-stable* flag to + when done/committed, or to - with a comment as to why an MFC to that branch is not necessary/invalid.
I think Wei has merged the fix (r285785 in Head) to both stable/10 and releng/10.2:
I think we can close the bug.
Assign to committer that resolved.
@Weh, if this is not relevant to stable/9, please set mfc-stable9 to - with comment
@Dexuan Cui, I'm not sure the patch is merged. The two revisions you mentioned are made the 28th and 30th of July 2015, whereas the working patch was provided by Wei the 4th of November.
(In reply to Eddy from comment #23)
Hi Eddy, hmm, I am sorry -- I was looking at the wrong patch...
I'll ask the committers to help to merge the correct patch to stable/10.
A commit references this bug:
Date: Fri Dec 18 14:56:49 UTC 2015
New revision: 292439
Ignore the inbound checksum flags when doing packet forwarding in netvsc
Sponsored by: Microsoft OSTC
Hi Eddy, roger has merged the fix to stable/10.
(In reply to Dexuan Cui from comment #26)
Thank you Dexuan and royger!
We've run into this too over at OPNsense. This is a harsh regression from 10.1 to 10.2. It needs an errata for 10.2.
The issue was fixed with patch r291156. I tested it on a clean FreeBSD install by recompiling the kernel in a test environment and it worked.
It was merged to the STABLE 10 branch (Fri Dec 18 14:56:49 UTC 2015). I assume that the latest build include the fix, however I'm running 10.2-RELEASE-p12 on my production server but the problem still occurs.
(In reply to Franco Fichtner from comment #28)
@Franco, thanks for the reminder! We're trying to contact the FreeBSD releasing team and make an errata for 10.2 as you suggested.
(In reply to Eddy from comment #29)
@Eddy, it turns out the fix is only in the Head and stable/10, but not in the releng/10.2. :-(
Now we're working with the FreeBSD releasing team on this too.
BTW, whu has left our team, but we're always be reachable by the 2 mails on the page https://wiki.freebsd.org/HyperV.
Pending request/response to merge this to the releng/10.2 branch for 10.3-RELEASE
(In reply to Dexuan Cui from comment #30)
Hi Franco and Eddy,
The errata is out here:
The fix for releng/10.2 is on the branch:
So you can update to 10.2 RELEASE-p14 to get the fix.
Brilliant, thank you. All looks fine. :)
The bug is fixed in 10.2 RELEASE-p14, 10.3 and 11-CURRENT.
The bug doesn't exist in 10.1.
I think we can close the bug now.