A massive amount of 'ICMP unreachable - fragmentation needed' observed on lo0 when pf(4) 'reply-to' is used for policy routing, which degrades the overall performance of the system severely. I have a web server with two NIC connected to different outgoing networks. Each network has a spoof filter, so I need to reply back to the I/F where the connection came from. +-----+ | em0(IP1.IP1.IP1.IP1) -- ISP1(GW1.GW1.GW1.GW1) | | | em1(IP2.IP2.IP2.IP2) -- ISP2(GW2.GW2.GW2.GW2) +-----+ So I use pf(4)'s 'reply-to' rule and noticed the symptom. The simplified pf.conf which show the symptom is as follows (IP addresses are masked): ------------- if_isp1="em0" isp1_router="GW1.GW1.GW1.GW1" if_isp2="em1" isp2_router="GW2.GW2.GW2.GW2" pass in all pass in reply-to ( $if_isp1 $isp1_router ) from any to $if_isp1 pass in reply-to ( $if_isp2 $isp2_router ) from any to $if_isp2 pass out all ------------- Then access the web server on IP1 from a client (SIP.SIP.SIP.SIP) and retrieve a large file such as a picture. While doing so, tcpdump -n -i lo0 shows a massive amount of ICMP packets flowing like this: # tcpdump -n -i lo0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on lo0, link-type NULL (BSD loopback), capture size 96 bytes 12:53:59.784441 IP 127.0.0.1 > IP1.IP1.IP1.IP1: ICMP SIP.SIP.SIP.SIP unreachable - need to frag (mtu 1500), length 48 12:53:59.784772 IP 127.0.0.1 > IP1.IP1.IP1.IP1: ICMP SIP.SIP.SIP.SIP unreachable - need to frag (mtu 1500), length 48 12:53:59.785001 IP 127.0.0.1 > IP1.IP1.IP1.IP1: ICMP SIP.SIP.SIP.SIP unreachable - need to frag (mtu 1500), length 48 12:53:59.785288 IP 127.0.0.1 > IP1.IP1.IP1.IP1: ICMP SIP.SIP.SIP.SIP unreachable - need to frag (mtu 1500), length 48 12:53:59.785482 IP 127.0.0.1 > IP1.IP1.IP1.IP1: ICMP SIP.SIP.SIP.SIP unreachable - need to frag (mtu 1500), length 48 .....(omit) The SIP host can retrieve the file, but the throughput is very poor. netstat(1) also shows an abnormal number of packet counts (irrelevant lines removed). % netstat -ni Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll em0 1500 <Link#1> 00:1c:c0:fa:c4:6a 79142 0 0 80093 0 0 em0 1500 IP1.IP1.IP1.9 IP1.IP1.IP1.IP1 2090652887 - - 472 - - em1 1500 <Link#2> 00:1b:21:52:52:60 141017 0 0 59392 0 0 em1 1500 IP2.IP2.IP2.0 IP2.IP2.IP2.IP2 83355 - - 58112 - - lo0 16384 <Link#6> 2090617974 0 0 2090617950 0 0 lo0 16384 127.0.0.0/8 127.0.0.1 35119 - - 2090610857 - - Some hardware combination didn't seem to exhibit the symptom. Actually I recently replaced the server and suddenly the problem started to occur. I examined the old server and noticed that I could also reproduce the symptom on the old server when I changed the default route. Old system runs FreeBSD 8.0R-p1 amd64. FreeBSD elf2.nc.kyushu-u.ac.jp 8.0-RELEASE-p1 FreeBSD 8.0-RELEASE-p1 #4: Wed Dec 16 15:49:14 JST 2009 root@elvenbow.cc.kyushu-u.ac.jp:/usr/obj/usr/src/sys/GENERIC amd64 On the old system, msk(4) and vge(4) are used for ISP connections. Default route to msk(4) is okay, but change it toward vge(4) exhibits the problem. Exchanging NIC for ISP1 and ISP2 doesn't matter, so it is more related to hardware (driver?) than network configuration, I guess. Fix: Unknown. I don't understand what is the source of these ICMP packets and why they are generated. How-To-Repeat: Explained in the Description section.
I changed the rule to use 'route-to' instead of 'reply-to' and the ICMP storm stopped. ---------- if_isp1="em0" isp1_router="GW1.GW1.GW1.GW1" if_isp2="em1" isp2_router="GW2.GW2.GW2.GW2" pass in all no state pass out all pass out route-to ( $if_isp1 $isp1_router ) from $if_isp1 pass out route-to ( $if_isp2 $isp2_router ) from $if_isp2 ---------- I'm not sure about the implementation difference of 'reply-to' and 'route-to'.
I'm sorry that my previous follow-up was incorrect. It seems that I need 'no state' for both 'route-to' lines or they are ignored. After that, packets are redirected to correct interfaces, and ICMP storm also revived.
Responsible Changed From-To: freebsd-bugs->freebsd-pf Over to maintainer(s).
I found a workaround for the problem. The problem won't happen when I removed TSO support from the interface which is used for the default route. About my old server, only msk(4) has TSO support, so the problem only happend when I used msk(4) for the default route. (My original post was a bit confused and incorrect, sorry). I guess there is something wrong with TSO related code (ip_output.c or tcp_output.c ?), but it is too much for me to understand them....
Can you please test the attached patch (by Pyun YongHyeon) and let us know if this fixes the situation for you? Thanks, Max Laier
I applied the patch on 8-STABLE (fetched today), rebuilt and installed the kernel and rebooted, but the problem still occured. -- Yoshiaki Kasahara Research Institute for Information Technology, Kyushu University kasahara@nc.kyushu-u.ac.jp
I found a blog reporting a similar symptom using ipfw(4)+divert(4)+natd(8) on 8.0R and 7.2R (less severe on 7.2R). http://www.bsddiary.net/d/201002.html#09 It is written in Japanese, but I guess you can still read configuration used for the test in the article. He also reported later that disabling TSO worked as a workaround. He used em(4) and fxp(4). I wonder if it might help locating the problem...
Hey, have you re-tried with an updated kernel and this patch again? It seems to help other people. Could you give us an update? /bz -- Bjoern A. Zeeb This signature is about you not me.
I'm sorry for my late reply. I updated my test machine to 8.1-STABLE on Sep. 2nd and applied the patch, but the problem persists. --Y.Kasahara
Responsible Changed From-To: freebsd-pf->bz Take to commit the patch, which will be half of the fix. Ermal has the other half as well that I'll need to review and follow-up with.
Author: bz Date: Fri Sep 10 00:00:06 2010 New Revision: 212403 URL: http://svn.freebsd.org/changeset/base/212403 Log: When using pf routing options, properly handle IP fragmentation for interfaces with TSO enabled, otherwise one would see an extra ICMP unreach, frag needed pre matching packet on lo0. This syncs pf code to ip_output.c r162084. PR: kern/144311 Submitted by: yongari via mlaier Reviewed by: eri Tested by: kib MFC after: 8 days Modified: head/sys/contrib/pf/net/pf.c Modified: head/sys/contrib/pf/net/pf.c ============================================================================== --- head/sys/contrib/pf/net/pf.c Thu Sep 9 23:45:59 2010 (r212402) +++ head/sys/contrib/pf/net/pf.c Fri Sep 10 00:00:06 2010 (r212403) @@ -6375,6 +6375,7 @@ pf_route(struct mbuf **m, struct pf_rule m0->m_pkthdr.csum_flags &= ifp->if_hwassist; if (ntohs(ip->ip_len) <= ifp->if_mtu || + (m0->m_pkthdr.csum_flags & ifp->if_hwassist & CSUM_TSO) != 0 || (ifp->if_hwassist & CSUM_FRAGMENT && ((ip->ip_off & htons(IP_DF)) == 0))) { /* @@ -6449,7 +6450,7 @@ pf_route(struct mbuf **m, struct pf_rule * Too large for interface; fragment if possible. * Must be able to put at least 8 bytes per fragment. */ - if (ip->ip_off & htons(IP_DF)) { + if (ip->ip_off & htons(IP_DF) || (m0->m_pkthdr.csum_flags & CSUM_TSO)) { KMOD_IPSTAT_INC(ips_cantfrag); if (r->rt != PF_DUPTO) { #ifdef __FreeBSD__ _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Hey, as stated when taking the ticket, the suggested patch (which was now comitted to HEAD) is only half of the fix. I'll try to review Ermal's patch the next days and follow-up with that for testing. Just for your information, so you don't wonder why this is going in though it doesn't help you. /bz -- Bjoern A. Zeeb Welcome a new stage of life.
Author: bz Date: Mon Sep 20 17:03:10 2010 New Revision: 212905 URL: http://svn.freebsd.org/changeset/base/212905 Log: MFC r212403: When using pf routing options, properly handle IP fragmentation for interfaces with TSO enabled, otherwise one would see an extra ICMP unreach, frag needed pre matching packet on lo0. This syncs pf code to ip_output.c r162084. Submitted by: yongari via mlaier Reviewed by: eri Tested by: kib PR: kern/144311 Modified: stable/8/sys/contrib/pf/net/pf.c Directory Properties: stable/8/sys/ (props changed) stable/8/sys/amd64/include/xen/ (props changed) stable/8/sys/cddl/contrib/opensolaris/ (props changed) stable/8/sys/contrib/dev/acpica/ (props changed) stable/8/sys/contrib/pf/ (props changed) stable/8/sys/dev/xen/xenpci/ (props changed) Modified: stable/8/sys/contrib/pf/net/pf.c ============================================================================== --- stable/8/sys/contrib/pf/net/pf.c Mon Sep 20 16:43:17 2010 (r212904) +++ stable/8/sys/contrib/pf/net/pf.c Mon Sep 20 17:03:10 2010 (r212905) @@ -6375,6 +6375,7 @@ pf_route(struct mbuf **m, struct pf_rule m0->m_pkthdr.csum_flags &= ifp->if_hwassist; if (ntohs(ip->ip_len) <= ifp->if_mtu || + (m0->m_pkthdr.csum_flags & ifp->if_hwassist & CSUM_TSO) != 0 || (ifp->if_hwassist & CSUM_FRAGMENT && ((ip->ip_off & htons(IP_DF)) == 0))) { /* @@ -6449,7 +6450,7 @@ pf_route(struct mbuf **m, struct pf_rule * Too large for interface; fragment if possible. * Must be able to put at least 8 bytes per fragment. */ - if (ip->ip_off & htons(IP_DF)) { + if (ip->ip_off & htons(IP_DF) || (m0->m_pkthdr.csum_flags & CSUM_TSO)) { KMOD_IPSTAT_INC(ips_cantfrag); if (r->rt != PF_DUPTO) { #ifdef __FreeBSD__ _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Author: bz Date: Mon Sep 20 18:26:37 2010 New Revision: 212910 URL: http://svn.freebsd.org/changeset/base/212910 Log: MFC r212403: When using pf routing options, properly handle IP fragmentation for interfaces with TSO enabled, otherwise one would see an extra ICMP unreach, frag needed pre matching packet on lo0. This syncs pf code to ip_output.c r162084. Submitted by: yongari via mlaier PR: kern/144311 Modified: stable/7/sys/contrib/pf/net/pf.c Directory Properties: stable/7/sys/ (props changed) stable/7/sys/cddl/contrib/opensolaris/ (props changed) stable/7/sys/contrib/dev/acpica/ (props changed) stable/7/sys/contrib/pf/ (props changed) Modified: stable/7/sys/contrib/pf/net/pf.c ============================================================================== --- stable/7/sys/contrib/pf/net/pf.c Mon Sep 20 18:20:35 2010 (r212909) +++ stable/7/sys/contrib/pf/net/pf.c Mon Sep 20 18:26:37 2010 (r212910) @@ -6376,6 +6376,7 @@ pf_route(struct mbuf **m, struct pf_rule m0->m_pkthdr.csum_flags &= ifp->if_hwassist; if (ntohs(ip->ip_len) <= ifp->if_mtu || + (m0->m_pkthdr.csum_flags & ifp->if_hwassist & CSUM_TSO) != 0 || (ifp->if_hwassist & CSUM_FRAGMENT && ((ip->ip_off & htons(IP_DF)) == 0))) { /* @@ -6450,7 +6451,7 @@ pf_route(struct mbuf **m, struct pf_rule * Too large for interface; fragment if possible. * Must be able to put at least 8 bytes per fragment. */ - if (ip->ip_off & htons(IP_DF)) { + if (ip->ip_off & htons(IP_DF) || (m0->m_pkthdr.csum_flags & CSUM_TSO)) { ipstat.ips_cantfrag++; if (r->rt != PF_DUPTO) { #ifdef __FreeBSD__ _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Just to add that I'm still seeing this issue. 9.1-STABLE FreeBSD 9.1-STABLE #0 r252282 Network adapter: <Intel(R) PRO/1000 Network Connection 7.3.7> Disabling TSO4, or removing the route-to entry from pf solves the problem. Disclaimer: http://www.peralex.com/disclaimer.html
Responsible Changed From-To: bz->gnn I shall not use bugzilla (at least until we will have a CLI).
I am still seeing this issue on 10.0-RELEASE. I can test patches, interestingly, this ONLY occurs for me on 'local' IPs. EG: when the destination IP address is a locally bound address, things break, otherwise they seem to work fine (when the resulting IP is actually routed beyond the firewall)
Hi! Found the same problem on both 9.3 and 10.1 versions uname -rp 9.3-RELEASE-p7 i386 wan1 adapter properties em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO> wan2 adapter properties re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE> pf.conf includes set skip on { lo0 } set state-policy if-bound scrub in all fragment reassemble pass in quick on $wan1 reply-to ($wan1 $gw_wan1) inet proto tcp from any to $wan1 port ssh pass in quick on $wan2 reply-to ($wan2 $gw_wan2) inet proto tcp from any to $wan2 port ssh pass out quick route-to ($wan1 $gw_wan1) inet proto tcp from $wan1 port ssh to any pass out quick route-to ($wan2 $gw_wan2) inet proto tcp from $wan2 port ssh to any Second machine has uname -rp 10.1-RELEASE-p8 i386 em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO> em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC> and the same rules listsing (I also has the same pf.conf on over 10 machines and they work well, seems like the reason is TSO4 )
Additional information to last post cat /var/run/dmesg.boot | grep em1 em1: <Intel(R) PRO/1000 Network Connection 7.3.8> port 0xc000-0xc01f mem 0xfe4c0000-0xfe4dffff,0xfe400000-0xfe47ffff,0xfe4e0000-0xfe4e3fff irq 17 at device 0.0 on pci4 em1: Using MSIX interrupts with 3 vectors cat /var/run/dmesg.boot | grep em0 em0: <Intel(R) PRO/1000 Network Connection 7.4.2> port 0xf020-0xf03f mem 0xfe500000-0xfe51ffff,0xfe527000-0xfe527fff irq 20 at device 25.0 on pci0 em0: Using an MSI interrupt All of this adapters integrated in motherboard
batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.