FreeBSD: FreeBSD moscow-alpha 10.2-STABLE FreeBSD 10.2-STABLE #0 r286954: Fri Aug 21 08:33:14 MSK 2015 emz@moscow-alpha:/usr/obj/usr/src/sys/MOSCOW amd64 Network scheme: (FreeBSD A) <---(gre inside ipsec)---> (FreeBSD B) <---gre inside ipsec---> (FreeBSD C) (uname taken from B) Issue: PF is on A pings B with icmp packets < gre MTU = everything is OK A pings C with icmp packets < gre MTU = everything is OK A pings B with icmp packets > gre MTU = everything is OK A pings C with icmp packets > gre MTU = got two answers, a normal ICMP reply from C, and an ICMP unreach from B: [emz@big-cherkiz5-1:~]# ping -s 4096 192.168.7.127 PING 192.168.7.127 (192.168.7.127): 4096 data bytes 36 bytes from 172.16.5.214: Destination Host Unreachable Vr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 055c a28a 0 0000 40 01 3908 172.16.5.215 192.168.7.127 4104 bytes from 192.168.7.127: icmp_seq=0 ttl=61 time=62.119 ms ^C --- 192.168.7.127 ping statistics --- 1 packets transmitted, 1 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 62.119/62.119/62.119/0.000 ms Workaround: disable pf on B. With pf disabled on B, situation resolves back to normal. The issue was first seen somewhere on 10-STABLE, didn't resolve so far. I've talk with tough guys, like ae@, he told me to report it, since it cannot be explained by configuration errors.
It doesn't look like a very simple setup, so ideally I'd like you to try to simplify it a bit. For example, is the ipsec bit required, or does it happen with just the GRE tunnels as well? It'd also be interesting to know if it happens both with and without 'scrub fragment reassemble'. Do you have anything 'special' in your pf.conf? (That is, route-to, dup-to, set state-policy, ... basically anything not pass or block.) (PS: I seem to be unable to send e-mail to you. Your mail server keeps telling me '452 4.7.1 Try again later')
Sorry it took that long (I was kinda overwhelmed by the amount of work). So, same setup: A <---gre/ipsec---> B <---gre/ipsec---> C. 1) ipsec removed between A and B. The issue persists. 2) pf disabled on B. The issue is no more. 3) ipsec added on B, pf still disabled. The issue is no more. 4) ipsec still on, pf enabled on B. The issue is back. 5) ipsec enabled, pf enabled, the following line removed from pf on B: scrub on $oif from !<voippbxes> fragment reassemble The issue persists. 6) Line from previous point added back, removed the line scrub on gre0 max-mss 1360 where gre0 is the B <---> C tunnel and the issue is gone. But I don't understand how the MSS enforcing can affect the ICMP packets, while it should only affect TCP.
I have reproduced the problem. I think we shouldn't use scrub rule without "in" option. I.e. rule should be scrub *in* on gre0 ... Without "in" this rule is triggered twice ("B" <--> "C"): for outgoing *fragmented* echo request and for incoming fragmented echo reply. As a result, the length of the received echo request exceeds the MTU on "C" box. I think it is not good. PF.CONF(5): "Traffic normalization is used to sanitize packet content in such a way that there are no ambiguities in packet interpretation on the receiving side. The normalizer does IP fragment reassembly to prevent attacks that confuse intrusion detection systems by sending overlapping IP fragments." Do we really need "max-mss 1360" on outgoing flow? However, appearance of "Destination Host Unreachable" remains unclear to me. It is routing stuff. Need to do some research.
It seems that scrubbing on both tunnels eliminates the problem. scrub on gre0 max-mss 1360 scrub on gre1 max-mss 1360
(In reply to Max from comment #3) Scrubbing in both directions should be safe, even with fragment reassemble. In IPv4 it's OK for a frame to not fit in the MTU. The router will fragment. (There's special casing in pf to handle the IPv6 scenario, but that doesn't seem to be relevant here.) It's also very strange that the mss setting has an influence on ICMP packets. I'd only expect that to affect TCP streams. It'd be interesting to get packet captures here (tcpdump -n -i <interface> -s0 -w output.pcap) of both the ICMP echo request and the ICMP error packets. Ideally capture on an interface outside the GRE tunnel (so we get the GRE headers too).
Created attachment 170591 [details] dumps (In reply to Kristof Provost from comment #5) Host "A" config: cloned_interfaces="gre0" ifconfig_em0="inet 192.168.10.1/24" defaultrouter="192.168.10.254" ifconfig_gre0="inet 10.10.1.1 10.10.2.1 tunnel 192.168.10.1 192.168.10.254" static_routes="rb rc" route_rb="10.10.2.0/24 10.10.2.1" route_rc="10.10.3.0/24 10.10.2.1" pf disabled. Host "C" config: cloned_interfaces="gre0" ifconfig_em0="inet 192.168.30.1/24" defaultrouter="192.168.30.254" ifconfig_gre0="inet 10.10.3.1 10.10.2.1 tunnel 192.168.30.1 192.168.30.254" static_routes="ra rb" route_ra="10.10.1.0/24 10.10.2.1" route_rb="10.10.2.0/24 10.10.2.1" pf disabled. Host "B" config: cloned_interfaces="gre0 gre1" ifconfig_em0="inet 192.168.10.254/24" ifconfig_em2="inet 192.168.30.254/24" ifconfig_gre0="inet 10.10.2.1 10.10.1.1 tunnel 192.168.10.254 192.168.10.1" ifconfig_gre1="inet 10.10.2.1 10.10.3.1 tunnel 192.168.30.254 192.168.30.1" static_routes="ra rc" route_ra="10.10.1.0/24 10.10.1.1" route_rc="10.10.3.0/24 10.10.3.1" pf.conf: set skip on lo #scrub on gre0 max-mss 1360 scrub on gre1 max-mss 1360 pass all pfctl -x misc gre MTU is 1476. So, 1476-28=1448 bytes sholud fit MTU. First, on host "A": ping -s 1450 -c 1 10.10.3.1 Then, on host "C": ping -s 1450 -c 1 10.10.1.1 Kernel log on host "B": May 24 10:57:38 isp kernel: em0: promiscuous mode enabled May 24 10:57:39 isp kernel: em2: promiscuous mode enabled May 24 10:58:13 isp kernel: pf_normalize_ip: reass frag 56321 @ 0-1456 May 24 10:58:13 isp kernel: pf_fillup_fragment: reass frag 56321 @ 0-1456pf_normalize_ip: reass frag 56321 @ 1456-1458 May 24 10:58:13 isp kernel: pf_fillup_fragment: reass frag 56321 @ 1456-1458pf_isfull_fragment: 1458 < 1458?pf_reassemble: complete: 0xfffff8001f4aa300(1478) May 24 10:58:13 isp kernel: pf_normalize_ip: reass frag 30208 @ 0-1456 May 24 10:58:13 isp kernel: pf_fillup_fragment: reass frag 30208 @ 0-1456pf_normalize_ip: reass frag 30208 @ 1456-1458 May 24 10:58:13 isp kernel: pf_fillup_fragment: reass frag 30208 @ 1456-1458pf_isfull_fragment: 1458 < 1458?pf_reassemble: complete: 0xfffff8001f662000(1478) May 24 10:58:39 isp kernel: pf_normalize_ip: reass frag 30464 @ 0-1456 May 24 10:58:39 isp kernel: pf_fillup_fragment: reass frag 30464 @ 0-1456pf_normalize_ip: reass frag 30464 @ 1456-1458 May 24 10:58:39 isp kernel: pf_fillup_fragment: reass frag 30464 @ 1456-1458pf_isfull_fragment: 1458 < 1458?pf_reassemble: complete: 0xfffff8001f661d00(1478) May 24 10:58:39 isp kernel: pf_normalize_ip: reass frag 57601 @ 0-1456 May 24 10:58:39 isp kernel: pf_fillup_fragment: reass frag 57601 @ 0-1456pf_normalize_ip: reass frag 57601 @ 1456-1458 May 24 10:58:39 isp kernel: pf_fillup_fragment: reass frag 57601 @ 1456-1458pf_isfull_fragment: 1458 < 1458?pf_reassemble: complete: 0xfffff8001f4aa100(1478) May 24 10:58:56 isp kernel: em0: promiscuous mode disabled May 24 10:58:57 isp kernel: em2: promiscuous mode disabled
I confirm, adding "in" on scrubbing for TCP MSS fixes the issue. Although the relation between TCP MSS fixing and the ICMP still seems bogus to me.
(In reply to emz from comment #7) fragment reassemble Using scrub rules, fragments can be reassembled by normalization. In this case, fragments are buffered until they form a complete packet, and only the completed packet is passed on to the filter. ... This is the default behavior of a scrub rule if no fragmentation modifier is supplied. Thus the rule 1. reassembles and fixes mss of tcp packets 2. reassembles other packets I think "scrub in proto tcp ..." should affect tcp packets only.
pflog output: 21:20:52.521232 rule 0..16777216/0(match): pass in on em0: 192.168.10.1 > 192.168.10.254: GREv0, proto IPv4 (0x0800), length 1480: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 24585, seq 0, length 1456 21:20:52.521241 rule 0..16777216/0(match): pass in on gre0: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 24585, seq 0, length 1456 21:20:52.521256 rule 0..16777216/0(match): pass out on gre0: 10.10.2.1 > 10.10.1.1: ICMP host 10.10.3.1 unreachable, length 36 21:20:52.521262 rule 0..16777216/0(match): pass out on em0: 192.168.10.254 > 192.168.10.1: GREv0, proto IPv4 (0x0800), length 60: 10.10.2.1 > 10.10.1.1: ICMP host 10.10.3.1 unreachable, length 36 21:20:52.521282 rule 0..16777216/0(match): pass in on em0: 192.168.10.1 > 192.168.10.254: GREv0, proto IPv4 (0x0800), length 26: 10.10.1.1 > 10.10.3.1: ip-proto-1 21:20:52.521286 rule 0..16777216/0(match): pass in on gre0: 10.10.1.1 > 10.10.3.1: ip-proto-1 21:20:52.521288 rule 0..16777216/0(match): pass out on gre1: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 24585, seq 0, length 1458 21:20:52.521316 rule 0..16777216/0(match): pass out on em2: 192.168.30.254 > 192.168.30.1: GREv0, proto IPv4 (0x0800), length 1482: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 24585, seq 0, length 1458 21:20:52.521598 rule 0..16777216/0(match): pass in on em2: 192.168.30.1 > 192.168.30.254: GREv0, proto IPv4 (0x0800), length 1480: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 24585, seq 0, length 1456 21:20:52.521614 rule 0..16777216/0(match): pass in on em2: 192.168.30.1 > 192.168.30.254: GREv0, proto IPv4 (0x0800), length 26: 10.10.3.1 > 10.10.1.1: ip-proto-1 21:20:52.521619 rule 0..16777216/0(match): pass in on gre1: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 24585, seq 0, length 1458 21:20:52.521624 rule 0..16777216/0(match): pass out on gre0: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 24585, seq 0, length 1458 21:20:52.521630 rule 0..16777216/0(match): pass out on em0: 192.168.10.254 > 192.168.10.1: GREv0, proto IPv4 (0x0800), length 1480: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 24585, seq 0, length 1456 21:20:52.521646 rule 0..16777216/0(match): pass out on em0: 192.168.10.254 > 192.168.10.1: GREv0, proto IPv4 (0x0800), length 26: 10.10.3.1 > 10.10.1.1: ip-proto-1
scrub on gre1 proto tcp max-mss 1360 (there is no "host unreachable" message). 21:28:54.220629 rule 0..16777216/0(match): pass in on em0: 192.168.10.1 > 192.168.10.254: GREv0, proto IPv4 (0x0800), length 1480: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 30473, seq 0, length 1456 21:28:54.220641 rule 0..16777216/0(match): pass in on gre0: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 30473, seq 0, length 1456 21:28:54.220650 rule 0..16777216/0(match): pass out on gre1: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 30473, seq 0, length 1456 21:28:54.220656 rule 0..16777216/0(match): pass out on em2: 192.168.30.254 > 192.168.30.1: GREv0, proto IPv4 (0x0800), length 1480: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 30473, seq 0, length 1456 21:28:54.220700 rule 0..16777216/0(match): pass in on em0: 192.168.10.1 > 192.168.10.254: GREv0, proto IPv4 (0x0800), length 26: 10.10.1.1 > 10.10.3.1: ip-proto-1 21:28:54.220704 rule 0..16777216/0(match): pass in on gre0: 10.10.1.1 > 10.10.3.1: ip-proto-1 21:28:54.220710 rule 0..16777216/0(match): pass out on gre1: 10.10.1.1 > 10.10.3.1: ip-proto-1 21:28:54.220716 rule 0..16777216/0(match): pass out on em2: 192.168.30.254 > 192.168.30.1: GREv0, proto IPv4 (0x0800), length 26: 10.10.1.1 > 10.10.3.1: ip-proto-1 21:28:54.220824 rule 0..16777216/0(match): pass in on em2: 192.168.30.1 > 192.168.30.254: GREv0, proto IPv4 (0x0800), length 1480: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 30473, seq 0, length 1456 21:28:54.220829 rule 0..16777216/0(match): pass in on gre1: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 30473, seq 0, length 1456 21:28:54.220835 rule 0..16777216/0(match): pass out on gre0: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 30473, seq 0, length 1456 21:28:54.220840 rule 0..16777216/0(match): pass out on em0: 192.168.10.254 > 192.168.10.1: GREv0, proto IPv4 (0x0800), length 1480: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 30473, seq 0, length 1456 21:28:54.220880 rule 0..16777216/0(match): pass in on em2: 192.168.30.1 > 192.168.30.254: GREv0, proto IPv4 (0x0800), length 26: 10.10.3.1 > 10.10.1.1: ip-proto-1 21:28:54.220886 rule 0..16777216/0(match): pass in on gre1: 10.10.3.1 > 10.10.1.1: ip-proto-1 21:28:54.220892 rule 0..16777216/0(match): pass out on gre0: 10.10.3.1 > 10.10.1.1: ip-proto-1 21:28:54.220899 rule 0..16777216/0(match): pass out on em0: 192.168.10.254 > 192.168.10.1: GREv0, proto IPv4 (0x0800), length 26: 10.10.3.1 > 10.10.1.1: ip-proto-1
Could it be gre issue (when packets dropped by kernel)?
And the last one... I'm sorry for these listings... scrub on gre0 max-mss 1360 scrub on gre1 max-mss 1360 (there is no "host unreachable") 22:22:56.728057 rule 0..16777216/0(match): pass in on em0: 192.168.10.1 > 192.168.10.254: GREv0, proto IPv4 (0x0800), length 1480: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 60681, seq 0, length 1456 22:22:56.728069 rule 0..16777216/0(match): pass in on em0: 192.168.10.1 > 192.168.10.254: GREv0, proto IPv4 (0x0800), length 26: 10.10.1.1 > 10.10.3.1: ip-proto-1 22:22:56.728078 rule 0..16777216/0(match): pass in on gre0: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 60681, seq 0, length 1458 22:22:56.728089 rule 0..16777216/0(match): pass out on gre1: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 60681, seq 0, length 1458 22:22:56.728097 rule 0..16777216/0(match): pass out on em2: 192.168.30.254 > 192.168.30.1: GREv0, proto IPv4 (0x0800), length 1480: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 60681, seq 0, length 1456 22:22:56.728140 rule 0..16777216/0(match): pass out on em2: 192.168.30.254 > 192.168.30.1: GREv0, proto IPv4 (0x0800), length 26: 10.10.1.1 > 10.10.3.1: ip-proto-1 22:22:56.728266 rule 0..16777216/0(match): pass in on em2: 192.168.30.1 > 192.168.30.254: GREv0, proto IPv4 (0x0800), length 1480: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 60681, seq 0, length 1456 22:22:56.728273 rule 0..16777216/0(match): pass in on em2: 192.168.30.1 > 192.168.30.254: GREv0, proto IPv4 (0x0800), length 26: 10.10.3.1 > 10.10.1.1: ip-proto-1 22:22:56.728278 rule 0..16777216/0(match): pass in on gre1: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 60681, seq 0, length 1458 22:22:56.728283 rule 0..16777216/0(match): pass out on gre0: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 60681, seq 0, length 1458 22:22:56.728288 rule 0..16777216/0(match): pass out on em0: 192.168.10.254 > 192.168.10.1: GREv0, proto IPv4 (0x0800), length 1480: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 60681, seq 0, length 1456 22:22:56.728325 rule 0..16777216/0(match): pass out on em0: 192.168.10.254 > 192.168.10.1: GREv0, proto IPv4 (0x0800), length 26: 10.10.3.1 > 10.10.1.1: ip-proto-1
I'm wondering if the 'max-mss' thing isn't a red herring. Can you try 'scrub on gre0', 'scrub on gre1' (so without the max-mss)?
scrub on gre1 14:35:43.641169 rule 0..16777216/0(match): pass in on em0: 192.168.10.1 > 192.168.10.254: GREv0, proto IPv4 (0x0800), length 1480: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 44806, seq 0, length 1456 14:35:43.641178 rule 0..16777216/0(match): pass in on gre0: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 44806, seq 0, length 1456 14:35:43.641194 rule 0..16777216/0(match): pass out on gre0: 10.10.2.1 > 10.10.1.1: ICMP host 10.10.3.1 unreachable, length 36 14:35:43.641200 rule 0..16777216/0(match): pass out on em0: 192.168.10.254 > 192.168.10.1: GREv0, proto IPv4 (0x0800), length 60: 10.10.2.1 > 10.10.1.1: ICMP host 10.10.3.1 unreachable, length 36 14:35:43.641218 rule 0..16777216/0(match): pass in on em0: 192.168.10.1 > 192.168.10.254: GREv0, proto IPv4 (0x0800), length 26: 10.10.1.1 > 10.10.3.1: ip-proto-1 14:35:43.641223 rule 0..16777216/0(match): pass in on gre0: 10.10.1.1 > 10.10.3.1: ip-proto-1 14:35:43.641230 rule 0..16777216/0(match): pass out on gre1: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 44806, seq 0, length 1458 14:35:43.641237 rule 0..16777216/0(match): pass out on em2: 192.168.30.254 > 192.168.30.1: GREv0, proto IPv4 (0x0800), length 1482: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 44806, seq 0, length 1458 14:35:43.641421 rule 0..16777216/0(match): pass in on em2: 192.168.30.1 > 192.168.30.254: GREv0, proto IPv4 (0x0800), length 1480: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 44806, seq 0, length 1456 14:35:43.641428 rule 0..16777216/0(match): pass in on em2: 192.168.30.1 > 192.168.30.254: GREv0, proto IPv4 (0x0800), length 26: 10.10.3.1 > 10.10.1.1: ip-proto-1 14:35:43.641434 rule 0..16777216/0(match): pass in on gre1: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 44806, seq 0, length 1458 14:35:43.641439 rule 0..16777216/0(match): pass out on gre0: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 44806, seq 0, length 1458 14:35:43.641479 rule 0..16777216/0(match): pass out on em0: 192.168.10.254 > 192.168.10.1: GREv0, proto IPv4 (0x0800), length 1480: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 44806, seq 0, length 1456 14:35:43.641497 rule 0..16777216/0(match): pass out on em0: 192.168.10.254 > 192.168.10.1: GREv0, proto IPv4 (0x0800), length 26: 10.10.3.1 > 10.10.1.1: ip-proto-1
scrub on gre0 scrub on gre1 14:39:39.127649 rule 0..16777216/0(match): pass in on em0: 192.168.10.1 > 192.168.10.254: GREv0, proto IPv4 (0x0800), length 1480: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 47622, seq 0, length 1456 14:39:39.127666 rule 0..16777216/0(match): pass in on em0: 192.168.10.1 > 192.168.10.254: GREv0, proto IPv4 (0x0800), length 26: 10.10.1.1 > 10.10.3.1: ip-proto-1 14:39:39.127672 rule 0..16777216/0(match): pass in on gre0: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 47622, seq 0, length 1458 14:39:39.127681 rule 0..16777216/0(match): pass out on gre1: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 47622, seq 0, length 1458 14:39:39.127689 rule 0..16777216/0(match): pass out on em2: 192.168.30.254 > 192.168.30.1: GREv0, proto IPv4 (0x0800), length 1480: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 47622, seq 0, length 1456 14:39:39.127734 rule 0..16777216/0(match): pass out on em2: 192.168.30.254 > 192.168.30.1: GREv0, proto IPv4 (0x0800), length 26: 10.10.1.1 > 10.10.3.1: ip-proto-1 14:39:39.127857 rule 0..16777216/0(match): pass in on em2: 192.168.30.1 > 192.168.30.254: GREv0, proto IPv4 (0x0800), length 1480: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 47622, seq 0, length 1456 14:39:39.127865 rule 0..16777216/0(match): pass in on em2: 192.168.30.1 > 192.168.30.254: GREv0, proto IPv4 (0x0800), length 26: 10.10.3.1 > 10.10.1.1: ip-proto-1 14:39:39.127869 rule 0..16777216/0(match): pass in on gre1: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 47622, seq 0, length 1458 14:39:39.127875 rule 0..16777216/0(match): pass out on gre0: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 47622, seq 0, length 1458 14:39:39.127880 rule 0..16777216/0(match): pass out on em0: 192.168.10.254 > 192.168.10.1: GREv0, proto IPv4 (0x0800), length 1480: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 47622, seq 0, length 1456 14:39:39.127917 rule 0..16777216/0(match): pass out on em0: 192.168.10.254 > 192.168.10.1: GREv0, proto IPv4 (0x0800), length 26: 10.10.3.1 > 10.10.1.1: ip-proto-1
So if I understand this correctly the problem is still there with only 'scrub on gre1' (so without the MSS clamping), but it's not there if scrubbing is done on both interfaces?
(In reply to Kristof Provost from comment #16) Absolutely. ICMP-unreach generated when the first fragment of echo request is dropped by pf, I think.
no scrub on gre1 proto icmp scrub on gre1 There is no "host unreachable". 14:58:34.741461 rule 0..16777216/0(match): pass in on em0: 192.168.10.1 > 192.168.10.254: GREv0, proto IPv4 (0x0800), length 1480: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 59142, seq 0, length 1456 14:58:34.741471 rule 0..16777216/0(match): pass in on gre0: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 59142, seq 0, length 1456 14:58:34.741479 rule 0..16777216/0(match): pass out on gre1: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 59142, seq 0, length 1456 14:58:34.741486 rule 0..16777216/0(match): pass out on em2: 192.168.30.254 > 192.168.30.1: GREv0, proto IPv4 (0x0800), length 1480: 10.10.1.1 > 10.10.3.1: ICMP echo request, id 59142, seq 0, length 1456 14:58:34.741542 rule 0..16777216/0(match): pass in on em0: 192.168.10.1 > 192.168.10.254: GREv0, proto IPv4 (0x0800), length 26: 10.10.1.1 > 10.10.3.1: ip-proto-1 14:58:34.741571 rule 0..16777216/0(match): pass in on gre0: 10.10.1.1 > 10.10.3.1: ip-proto-1 14:58:34.741576 rule 0..16777216/0(match): pass out on gre1: 10.10.1.1 > 10.10.3.1: ip-proto-1 14:58:34.741580 rule 0..16777216/0(match): pass out on em2: 192.168.30.254 > 192.168.30.1: GREv0, proto IPv4 (0x0800), length 26: 10.10.1.1 > 10.10.3.1: ip-proto-1 14:58:34.741648 rule 0..16777216/0(match): pass in on em2: 192.168.30.1 > 192.168.30.254: GREv0, proto IPv4 (0x0800), length 1480: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 59142, seq 0, length 1456 14:58:34.741654 rule 0..16777216/0(match): pass in on gre1: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 59142, seq 0, length 1456 14:58:34.741659 rule 0..16777216/0(match): pass out on gre0: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 59142, seq 0, length 1456 14:58:34.741665 rule 0..16777216/0(match): pass out on em0: 192.168.10.254 > 192.168.10.1: GREv0, proto IPv4 (0x0800), length 1480: 10.10.3.1 > 10.10.1.1: ICMP echo reply, id 59142, seq 0, length 1456 14:58:34.741682 rule 0..16777216/0(match): pass in on em2: 192.168.30.1 > 192.168.30.254: GREv0, proto IPv4 (0x0800), length 26: 10.10.3.1 > 10.10.1.1: ip-proto-1 14:58:34.741686 rule 0..16777216/0(match): pass in on gre1: 10.10.3.1 > 10.10.1.1: ip-proto-1 14:58:34.741691 rule 0..16777216/0(match): pass out on gre0: 10.10.3.1 > 10.10.1.1: ip-proto-1 14:58:34.741696 rule 0..16777216/0(match): pass out on em0: 192.168.10.254 > 192.168.10.1: GREv0, proto IPv4 (0x0800), length 26: 10.10.3.1 > 10.10.1.1: ip-proto-1
I've never read FreeBSD sources, except pf's last week... probably I'm wrong. ip_input()->ip_forward()->ip_output()->ip_output_pfil()->pfil_run_hooks()->pf_test(). If ip_output() returns any error, then in ip_forward(): error = ip_output(...); ... switch (error) { case 0: /* forwarded, but need redirect */ /* type, code set above */ break; ... default: type = ICMP_UNREACH; code = ICMP_UNREACH_HOST; break; ... icmp_error(...); So, we have incoming fragment of echo request. There are two options: 1. pf returns PF_PASS -> ip_output() returns 0 -> everything is OK 2. pf returns PF_DROP -> ip_output() returns nonzero value -> we have icmp-unreach message. pf returns PF_DROP when we have (implicit) "scrub out on...". Please, correct me if I missing something.
Created attachment 170747 [details] pf_frag_pass patch (In reply to Max from comment #19) You may be on to something there. pf_reassemble() actually returns PF_PASS, but it's turned back into PF_DROP later on. It actually looks like this'd be a problem for IPv6 too. Can you give the attached patch a try? I'm not completely happy with it, but it should fix the problem.
(In reply to Kristof Provost from comment #20) Hello, Kristof. This patch works. But there is another problem: pass log (all) all block drop out log (all) on gre1 proto icmp In this case icmp-unreach still presents.
(In reply to Max from comment #21) Yeah, I guess that makes sense. After all, the rules tell PF to drop the ICMP packet, which it does. It tells the network stack that the packet was dropped, so it generates an 'ICMP destination unreachable' error. In this case that's correct, because the destination really is unreachable. Arguably that error should be under the control of the firewall, but I'm not sure this is really wrong.
(In reply to Kristof Provost from comment #22) I agree. But should we send any ICMP if we have "block drop ..." rule, not "block return ..."?
(In reply to Max from comment #23) Yeah, that's certainly a valid point. Arguably the network stack shouldn't send errors if the firewall drops a packet, instead leaving it to the firewall to send an error. Or perhaps we should extend the netpfil interface to support both scenarios. Either way, this change will affect more than just pf, so it'd have to be done very carefully.
(In reply to Kristof Provost from comment #24) That's why I've reviewed ip_input.c and ip_output.c. It's not just a routing issue... Can you discuss this problem with responsible developers?
Does it look reasonable? We should use consistent return values in pf_reassemble(), I think. --- pf_norm.c.orig 2016-05-28 23:40:52.171196000 +0300 +++ pf_norm.c 2016-05-28 23:50:39.912093000 +0300 @@ -623,7 +623,7 @@ pf_reassemble(struct mbuf **m0, struct i m = *m0 = NULL; if (!pf_isfull_fragment(frag)) - return (PF_PASS); /* drop because *m0 is NULL, no error */ + return (PF_DROP); /* We have all the data */ frent = TAILQ_FIRST(&frag->fr_queue); @@ -1284,8 +1284,6 @@ pf_normalize_ip(struct mbuf **m0, int di return (PF_DROP); m = *m0; - if (m == NULL) - return (PF_DROP); /* use mtag from concatenated mbuf chain */ pd->pf_mtag = pf_find_mtag(m); IPv6 versions should be fixed too.
(In reply to Max from comment #26) I think what we need to do is very carefully go through all the return paths in pf. There's basically three scenarios: * Accept packet (modified or not) => return PF_PASS *m0 is the mbuf * Reject the packet (i.e tell the stack it couldn't be sent) => return PF_DROP * Drop the packet (i.e. it vanishes, do not tell the stack it couldn't be sent) => return PF_PASS, *m0 is NULL. The pf_isfull_fragment() check needs to return PF_PASS (with *m0 == NULL), because the packet was just buffered until we have the full (reassembled) packet. It's not been rejected. The if (m == NULL) check is really required, because we do set *m0 to NULL during reassembly.
(In reply to Kristof Provost from comment #27) Hello, Kristof. Thank you for your reply. I understand the logic of current implementation of pf_reassemble(). But it does not return a value directly to network stack. I think it could return PF_PASS only in single case: the packet is fully reassembled. Instead, pf_normalize_ip() does it: immediately returns PF_DROP if pf_reassemble() == PF_PASS && *m0 == NULL. I think, it is confusing a bit... In any way, this is just a suggestion. (:
Created attachment 171268 [details] pf error returns Hmm. I might be making this harder than it needs to be. If the netpfil hook returns EACCESS ip_forward() won't actually generate an ICMP error message. The problem is that PF returns PF_PASS, PF_DROP, ... instead of the error codes the stack expects. Can you test this patch? It's interesting that this doesn't seem to be as big a problem on CURRENT, because the fast forwarding code (ip_tryforward()) doesn't generate ICMP errors for netpfil() errors.
(In reply to Kristof Provost from comment #29) It seems the patch works, except one thing: "block return ..." does not generate ICMP message.
(In reply to Max from comment #30) But it looks like this is not a patch problem.
(In reply to Max from comment #31) Okay, let's track that in a new bug. Can you create one?
(In reply to Max from comment #30) Was this with TCP/UDP or ICMP? Note that pf doesn't generate ICMP error messages for anything other than UDP (and TCP RST for TCP): return A TCP RST is returned for blocked TCP packets, an ICMP UNREACHABLE is returned for blocked UDP packets, and all other packets are silently dropped.
(In reply to Kristof Provost from comment #33) Yeah, that's my fault... It is ICMP. But man pf.conf says return This causes a TCP RST to be returned for tcp(4) packets and an ICMP UNREACHABLE for UDP and other packets. (In reply to Kristof Provost from comment #32) I'm trying to understand what's happening... Without the patch: ruleset 1: scrub on gre1 pass log (all) all block return out log (all) on gre1 proto icmp ICMP-unreach exists. ruleset 2: scrub on gre1 pass log (all) all block return in log (all) on gre0 proto icmp ICMP-unreach doesn't exist. Should it? ruleset 3: scrub on gre0 scrub on gre1 pass log (all) all block return in log (all) on gre0 proto icmp ICMP-unreach doesn't exist. I've rebuilt the kernel... again... the patched version. There is no ICMP-unreach at all. So, the first case is relevant to patch, I think...
(In reply to Kristof Provost from comment #33) Now I see that... } else if (pd->proto != IPPROTO_ICMP && af == AF_INET && r->return_icmp) pf_send_icmp(m, r->return_icmp >> 8, r->return_icmp & 255, af, r); else if (pd->proto != IPPROTO_ICMPV6 && af == AF_INET6 && r->return_icmp6) pf_send_icmp(m, r->return_icmp6 >> 8, r->return_icmp6 & 255, af, r); block return ... proto tcp works as expected.
(In reply to Max from comment #35) Yeah, we may want to clarify the man page somewhat, and explicitly state that there are no ICMP error messages for ICMP packets.
A commit references this bug: Author: kp Date: Sat Jul 9 12:17:01 UTC 2016 New revision: 302497 URL: https://svnweb.freebsd.org/changeset/base/302497 Log: pf: Map hook returns onto the correct error values pf returns PF_PASS, PF_DROP, ... in the netpfil hooks, but the hook callers expect to get E<foo> error codes. Map the returns values. A pass is 0 (everything is OK), anything else means pf ate the packet, so return EACCES, which tells the stack not to emit an ICMP error message. PR: 207598 Changes: head/sys/netpfil/pf/pf_ioctl.c
A commit references this bug: Author: kp Date: Wed Aug 17 09:23:41 UTC 2016 New revision: 304282 URL: https://svnweb.freebsd.org/changeset/base/304282 Log: MFC r302497: pf: Map hook returns onto the correct error values pf returns PF_PASS, PF_DROP, ... in the netpfil hooks, but the hook callers expect to get E<foo> error codes. Map the returns values. A pass is 0 (everything is OK), anything else means pf ate the packet, so return EACCES, which tells the stack not to emit an ICMP error message. PR: 207598 Changes: _U stable/11/ stable/11/sys/netpfil/pf/pf_ioctl.c
A commit references this bug: Author: kp Date: Wed Aug 17 09:24:46 UTC 2016 New revision: 304283 URL: https://svnweb.freebsd.org/changeset/base/304283 Log: MFC r302497: pf: Map hook returns onto the correct error values pf returns PF_PASS, PF_DROP, ... in the netpfil hooks, but the hook callers expect to get E<foo> error codes. Map the returns values. A pass is 0 (everything is OK), anything else means pf ate the packet, so return EACCES, which tells the stack not to emit an ICMP error message. PR: 207598 Changes: _U stable/10/ stable/10/sys/netpfil/pf/pf_ioctl.c