Bug 144311 - [pf] [icmp] massive ICMP storm on lo0 occurs when using pf(4) 'reply-to'
Summary: [pf] [icmp] massive ICMP storm on lo0 occurs when using pf(4) 'reply-to'
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 8.0-STABLE
Hardware: Any Any
: Normal Affects Only Me
Assignee: George V. Neville-Neil
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-02-26 06:00 UTC by kasahara
Modified: 2015-07-21 02:38 UTC (History)
3 users (show)

See Also:


Attachments
pf.routeto.patch (789 bytes, patch)
2010-03-19 13:35 UTC, max
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description kasahara 2010-02-26 06:00:09 UTC
A massive amount of 'ICMP unreachable - fragmentation needed' observed
on lo0 when pf(4) 'reply-to' is used for policy routing, which
degrades the overall performance of the system severely.

I have a web server with two NIC connected to different outgoing
networks.  Each network has a spoof filter, so I need to reply back to
the I/F where the connection came from.

+-----+
|    em0(IP1.IP1.IP1.IP1) -- ISP1(GW1.GW1.GW1.GW1)
|     |
|    em1(IP2.IP2.IP2.IP2) -- ISP2(GW2.GW2.GW2.GW2)
+-----+

So I use pf(4)'s 'reply-to' rule and noticed the symptom.

The simplified pf.conf which show the symptom is as follows (IP
addresses are masked):

-------------
if_isp1="em0"
isp1_router="GW1.GW1.GW1.GW1"
if_isp2="em1"
isp2_router="GW2.GW2.GW2.GW2"

pass in all
pass in reply-to ( $if_isp1 $isp1_router ) from any to $if_isp1
pass in reply-to ( $if_isp2 $isp2_router ) from any to $if_isp2
pass out all
-------------

Then access the web server on IP1 from a client (SIP.SIP.SIP.SIP) and
retrieve a large file such as a picture. While doing so, tcpdump -n -i
lo0 shows a massive amount of ICMP packets flowing like this:

# tcpdump -n -i lo0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo0, link-type NULL (BSD loopback), capture size 96 bytes
12:53:59.784441 IP 127.0.0.1 > IP1.IP1.IP1.IP1: ICMP SIP.SIP.SIP.SIP unreachable - need to frag (mtu 1500), length 48
12:53:59.784772 IP 127.0.0.1 > IP1.IP1.IP1.IP1: ICMP SIP.SIP.SIP.SIP unreachable - need to frag (mtu 1500), length 48
12:53:59.785001 IP 127.0.0.1 > IP1.IP1.IP1.IP1: ICMP SIP.SIP.SIP.SIP unreachable - need to frag (mtu 1500), length 48
12:53:59.785288 IP 127.0.0.1 > IP1.IP1.IP1.IP1: ICMP SIP.SIP.SIP.SIP unreachable - need to frag (mtu 1500), length 48
12:53:59.785482 IP 127.0.0.1 > IP1.IP1.IP1.IP1: ICMP SIP.SIP.SIP.SIP unreachable - need to frag (mtu 1500), length 48
.....(omit)

The SIP host can retrieve the file, but the throughput is very
poor.

netstat(1) also shows an abnormal number of packet counts (irrelevant
lines removed).

% netstat -ni
Name    Mtu Network       Address              Ipkts Ierrs Idrop    Opkts Oerrs  Coll
em0    1500 <Link#1>      00:1c:c0:fa:c4:6a    79142     0     0    80093     0     0
em0    1500 IP1.IP1.IP1.9 IP1.IP1.IP1.IP1   2090652887     -     -      472     -     -
em1    1500 <Link#2>      00:1b:21:52:52:60   141017     0     0    59392     0     0
em1    1500 IP2.IP2.IP2.0 IP2.IP2.IP2.IP2      83355     -     -    58112     -     -
lo0   16384 <Link#6>                        2090617974     0     0 2090617950     0     0
lo0   16384 127.0.0.0/8   127.0.0.1            35119     -     - 2090610857     -     -

Some hardware combination didn't seem to exhibit the symptom.
Actually I recently replaced the server and suddenly the problem
started to occur.  I examined the old server and noticed that I could
also reproduce the symptom on the old server when I changed the
default route.  Old system runs FreeBSD 8.0R-p1 amd64.

FreeBSD elf2.nc.kyushu-u.ac.jp 8.0-RELEASE-p1 FreeBSD 8.0-RELEASE-p1 #4: Wed Dec 16 15:49:14 JST 2009     root@elvenbow.cc.kyushu-u.ac.jp:/usr/obj/usr/src/sys/GENERIC  amd64

On the old system, msk(4) and vge(4) are used for ISP connections.
Default route to msk(4) is okay, but change it toward vge(4) exhibits
the problem. Exchanging NIC for ISP1 and ISP2 doesn't matter, so it is
more related to hardware (driver?) than network configuration, I
guess.

Fix: 

Unknown.  I don't understand what is the source of these ICMP packets
and why they are generated.
How-To-Repeat: 
Explained in the Description section.
Comment 1 kasahara 2010-02-26 08:33:42 UTC
I changed the rule to use 'route-to' instead of 'reply-to' and the
ICMP storm stopped.

----------
if_isp1="em0"
isp1_router="GW1.GW1.GW1.GW1"
if_isp2="em1"
isp2_router="GW2.GW2.GW2.GW2"

pass in all no state
pass out all
pass out route-to ( $if_isp1 $isp1_router ) from $if_isp1
pass out route-to ( $if_isp2 $isp2_router ) from $if_isp2
----------

I'm not sure about the implementation difference of 'reply-to' and
'route-to'.
Comment 2 kasahara 2010-02-26 09:27:17 UTC
I'm sorry that my previous follow-up was incorrect.

It seems that I need 'no state' for both 'route-to' lines or they are
ignored.  After that, packets are redirected to correct interfaces,
and ICMP storm also revived.
Comment 3 Mark Linimon freebsd_committer 2010-02-26 11:23:27 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-pf

Over to maintainer(s).
Comment 4 kasahara 2010-03-19 07:09:18 UTC
I found a workaround for the problem.

The problem won't happen when I removed TSO support from the interface
which is used for the default route.

About my old server, only msk(4) has TSO support, so the problem only
happend when I used msk(4) for the default route.  (My original post
was a bit confused and incorrect, sorry).

I guess there is something wrong with TSO related code (ip_output.c or
tcp_output.c ?), but it is too much for me to understand them....
Comment 5 max 2010-03-19 13:35:05 UTC
Can you please test the attached patch (by Pyun YongHyeon) and let us know if 
this fixes the situation for you?

Thanks,
  Max Laier
Comment 6 kasahara 2010-03-23 05:39:35 UTC
I applied the patch on 8-STABLE (fetched today), rebuilt and installed
the kernel and rebooted, but the problem still occured.

-- 
Yoshiaki Kasahara
Research Institute for Information Technology, Kyushu University
kasahara@nc.kyushu-u.ac.jp
Comment 7 kasahara 2010-03-24 08:55:18 UTC
I found a blog reporting a similar symptom using
ipfw(4)+divert(4)+natd(8) on 8.0R and 7.2R (less severe on 7.2R).

http://www.bsddiary.net/d/201002.html#09

It is written in Japanese, but I guess you can still read
configuration used for the test in the article.

He also reported later that disabling TSO worked as a workaround.
He used em(4) and fxp(4).

I wonder if it might help locating the problem...
Comment 8 Bjoern A. Zeeb freebsd_committer 2010-08-21 13:47:15 UTC
Hey,

have you re-tried with an updated kernel and this patch again?
It seems to help other people. Could you give us an update?

/bz

-- 
Bjoern A. Zeeb                       This signature is about you not me.
Comment 9 kasahara 2010-09-06 04:33:56 UTC
I'm sorry for my late reply.

I updated my test machine to 8.1-STABLE on Sep. 2nd and applied the
patch, but the problem persists.

--Y.Kasahara
Comment 10 Bjoern A. Zeeb freebsd_committer 2010-09-09 16:29:52 UTC
Responsible Changed
From-To: freebsd-pf->bz

Take to commit the patch, which will be half of the fix. 
Ermal has the other half as well that I'll need to review and 
follow-up with.
Comment 11 dfilter freebsd_committer 2010-09-10 01:00:26 UTC
Author: bz
Date: Fri Sep 10 00:00:06 2010
New Revision: 212403
URL: http://svn.freebsd.org/changeset/base/212403

Log:
  When using pf routing options, properly handle IP fragmentation
  for interfaces with TSO enabled, otherwise one would see an extra
  ICMP unreach, frag needed pre matching packet on lo0.
  This syncs pf code to ip_output.c r162084.
  
  PR:		kern/144311
  Submitted by:	yongari via mlaier
  Reviewed by:	eri
  Tested by:	kib
  MFC after:	8 days

Modified:
  head/sys/contrib/pf/net/pf.c

Modified: head/sys/contrib/pf/net/pf.c
==============================================================================
--- head/sys/contrib/pf/net/pf.c	Thu Sep  9 23:45:59 2010	(r212402)
+++ head/sys/contrib/pf/net/pf.c	Fri Sep 10 00:00:06 2010	(r212403)
@@ -6375,6 +6375,7 @@ pf_route(struct mbuf **m, struct pf_rule
 	m0->m_pkthdr.csum_flags &= ifp->if_hwassist;
 
 	if (ntohs(ip->ip_len) <= ifp->if_mtu ||
+	    (m0->m_pkthdr.csum_flags & ifp->if_hwassist & CSUM_TSO) != 0 ||
 	    (ifp->if_hwassist & CSUM_FRAGMENT &&
 		((ip->ip_off & htons(IP_DF)) == 0))) {
 		/*
@@ -6449,7 +6450,7 @@ pf_route(struct mbuf **m, struct pf_rule
 	 * Too large for interface; fragment if possible.
 	 * Must be able to put at least 8 bytes per fragment.
 	 */
-	if (ip->ip_off & htons(IP_DF)) {
+	if (ip->ip_off & htons(IP_DF) || (m0->m_pkthdr.csum_flags & CSUM_TSO)) {
 		KMOD_IPSTAT_INC(ips_cantfrag);
 		if (r->rt != PF_DUPTO) {
 #ifdef __FreeBSD__
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 12 Bjoern A. Zeeb freebsd_committer 2010-09-10 01:03:42 UTC
Hey,

as stated when taking the ticket, the suggested patch (which was now
comitted to HEAD) is only half of the fix.  I'll try to review Ermal's
patch the next days and follow-up with that for testing.  Just for
your information, so you don't wonder why this is going in though it
doesn't help you.

/bz

-- 
Bjoern A. Zeeb                              Welcome a new stage of life.
Comment 13 dfilter freebsd_committer 2010-09-20 18:03:17 UTC
Author: bz
Date: Mon Sep 20 17:03:10 2010
New Revision: 212905
URL: http://svn.freebsd.org/changeset/base/212905

Log:
  MFC r212403:
  
    When using pf routing options, properly handle IP fragmentation
    for interfaces with TSO enabled, otherwise one would see an extra
    ICMP unreach, frag needed pre matching packet on lo0.
    This syncs pf code to ip_output.c r162084.
  
    Submitted by: yongari via mlaier
    Reviewed by:  eri
    Tested by:    kib
  PR:           kern/144311

Modified:
  stable/8/sys/contrib/pf/net/pf.c
Directory Properties:
  stable/8/sys/   (props changed)
  stable/8/sys/amd64/include/xen/   (props changed)
  stable/8/sys/cddl/contrib/opensolaris/   (props changed)
  stable/8/sys/contrib/dev/acpica/   (props changed)
  stable/8/sys/contrib/pf/   (props changed)
  stable/8/sys/dev/xen/xenpci/   (props changed)

Modified: stable/8/sys/contrib/pf/net/pf.c
==============================================================================
--- stable/8/sys/contrib/pf/net/pf.c	Mon Sep 20 16:43:17 2010	(r212904)
+++ stable/8/sys/contrib/pf/net/pf.c	Mon Sep 20 17:03:10 2010	(r212905)
@@ -6375,6 +6375,7 @@ pf_route(struct mbuf **m, struct pf_rule
 	m0->m_pkthdr.csum_flags &= ifp->if_hwassist;
 
 	if (ntohs(ip->ip_len) <= ifp->if_mtu ||
+	    (m0->m_pkthdr.csum_flags & ifp->if_hwassist & CSUM_TSO) != 0 ||
 	    (ifp->if_hwassist & CSUM_FRAGMENT &&
 		((ip->ip_off & htons(IP_DF)) == 0))) {
 		/*
@@ -6449,7 +6450,7 @@ pf_route(struct mbuf **m, struct pf_rule
 	 * Too large for interface; fragment if possible.
 	 * Must be able to put at least 8 bytes per fragment.
 	 */
-	if (ip->ip_off & htons(IP_DF)) {
+	if (ip->ip_off & htons(IP_DF) || (m0->m_pkthdr.csum_flags & CSUM_TSO)) {
 		KMOD_IPSTAT_INC(ips_cantfrag);
 		if (r->rt != PF_DUPTO) {
 #ifdef __FreeBSD__
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 14 dfilter freebsd_committer 2010-09-20 19:26:42 UTC
Author: bz
Date: Mon Sep 20 18:26:37 2010
New Revision: 212910
URL: http://svn.freebsd.org/changeset/base/212910

Log:
  MFC r212403:
  
    When using pf routing options, properly handle IP fragmentation
    for interfaces with TSO enabled, otherwise one would see an extra
    ICMP unreach, frag needed pre matching packet on lo0.
    This syncs pf code to ip_output.c r162084.
  
    Submitted by: yongari via mlaier
  
  PR:	kern/144311

Modified:
  stable/7/sys/contrib/pf/net/pf.c
Directory Properties:
  stable/7/sys/   (props changed)
  stable/7/sys/cddl/contrib/opensolaris/   (props changed)
  stable/7/sys/contrib/dev/acpica/   (props changed)
  stable/7/sys/contrib/pf/   (props changed)

Modified: stable/7/sys/contrib/pf/net/pf.c
==============================================================================
--- stable/7/sys/contrib/pf/net/pf.c	Mon Sep 20 18:20:35 2010	(r212909)
+++ stable/7/sys/contrib/pf/net/pf.c	Mon Sep 20 18:26:37 2010	(r212910)
@@ -6376,6 +6376,7 @@ pf_route(struct mbuf **m, struct pf_rule
 	m0->m_pkthdr.csum_flags &= ifp->if_hwassist;
 
 	if (ntohs(ip->ip_len) <= ifp->if_mtu ||
+	    (m0->m_pkthdr.csum_flags & ifp->if_hwassist & CSUM_TSO) != 0 ||
 	    (ifp->if_hwassist & CSUM_FRAGMENT &&
 		((ip->ip_off & htons(IP_DF)) == 0))) {
 		/*
@@ -6450,7 +6451,7 @@ pf_route(struct mbuf **m, struct pf_rule
 	 * Too large for interface; fragment if possible.
 	 * Must be able to put at least 8 bytes per fragment.
 	 */
-	if (ip->ip_off & htons(IP_DF)) {
+	if (ip->ip_off & htons(IP_DF) || (m0->m_pkthdr.csum_flags & CSUM_TSO)) {
 		ipstat.ips_cantfrag++;
 		if (r->rt != PF_DUPTO) {
 #ifdef __FreeBSD__
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 15 mark 2013-08-29 16:01:47 UTC
Just to add that I'm still seeing this issue. 

 9.1-STABLE FreeBSD 9.1-STABLE #0 r252282

Network adapter:

<Intel(R) PRO/1000 Network Connection 7.3.7>

Disabling TSO4, or removing the route-to entry from pf solves the problem.



Disclaimer: http://www.peralex.com/disclaimer.html
Comment 16 Bjoern A. Zeeb freebsd_committer 2014-05-18 06:01:19 UTC
Responsible Changed
From-To: bz->gnn

I shall not use bugzilla (at least until we will have a CLI).
Comment 17 Adam Jacob Muller 2014-09-19 02:06:43 UTC
I am still seeing this issue on 10.0-RELEASE.

I can test patches, interestingly, this ONLY occurs for me on 'local' IPs.

EG: when the destination IP address is a locally bound address, things break, otherwise they seem to work fine (when the resulting IP is actually routed beyond the firewall)
Comment 18 Aleksandr 2015-04-10 12:57:25 UTC
Hi!

Found the same problem on both 9.3 and 10.1 versions

uname -rp
9.3-RELEASE-p7 i386

wan1 adapter properties
em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500        options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO>

wan2 adapter properties
re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500     options=8209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>

pf.conf includes 

set skip on { lo0 }
set state-policy if-bound
scrub in all fragment reassemble

pass in quick on $wan1 reply-to ($wan1 $gw_wan1) inet proto tcp from any to $wan1 port ssh
pass in quick on $wan2 reply-to ($wan2 $gw_wan2) inet proto tcp from any to $wan2 port ssh
pass out quick route-to ($wan1 $gw_wan1) inet proto tcp from $wan1 port ssh to any
pass out quick route-to ($wan2 $gw_wan2) inet proto tcp from $wan2 port ssh to any 


Second machine has
uname -rp
10.1-RELEASE-p8 i386


em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500     options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO>

em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500    options=209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC>


and the same rules listsing (I also has the same pf.conf on over 10 machines and they work well, seems like the reason is TSO4 )
Comment 19 Aleksandr 2015-04-10 13:01:04 UTC
Additional information to last post

cat /var/run/dmesg.boot | grep em1
em1: <Intel(R) PRO/1000 Network Connection 7.3.8> port 0xc000-0xc01f mem 0xfe4c0000-0xfe4dffff,0xfe400000-0xfe47ffff,0xfe4e0000-0xfe4e3fff irq 17 at device 0.0 on pci4
em1: Using MSIX interrupts with 3 vectors


cat /var/run/dmesg.boot | grep em0
em0: <Intel(R) PRO/1000 Network Connection 7.4.2> port 0xf020-0xf03f mem 0xfe500000-0xfe51ffff,0xfe527000-0xfe527fff irq 20 at device 25.0 on pci0
em0: Using an MSI interrupt

All of this adapters integrated in motherboard