Hello. I have the following setup with two VM's: <public net> --- [ FreeBSD 13.0 RC4 GW_VM + NAT ] --- <private net> --- [Linux VM] GW_VM: Interfaces: vtnet1 <public ip> vtnet2 192.168.1.1/24 net.inet.ip.forwarding=1 NAT pf.conf: nat on vtnet1 from 192.168.1.0/24 to any -> vtnet1 Linux VM: enp0s2 192.168.1 When I'm trying iperf3 from Linux VM to public host: [ 4] local 192.168.1.4 port 49412 connected to <PUBLIC_HOST> port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.01 sec 263 KBytes 2.14 Mbits/sec 45 5.66 KBytes [ 4] 1.01-2.00 sec 156 KBytes 1.28 Mbits/sec 32 5.66 KBytes [ 4] 2.00-3.00 sec 156 KBytes 1.27 Mbits/sec 26 5.66 KBytes The low upload speed is predictable due to virtio-net offload are enabled. But what I did not expect was the absence of the needfrag ICMP packet. I setup 12.2 RELEASE with same configuration, and root@edge-12:~ # tcpdump -i vtnet2 proto ICMP tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vtnet2, link-type EN10MB (Ethernet), capture size 262144 bytes 14:07:09.803538 IP 192.168.1.1 > 192.168.1.4: ICMP 10.78.28.17 unreachable - need to frag (mtu 1500), length 176 14:07:09.803581 IP 192.168.1.1 > 192.168.1.4: ICMP 10.78.28.17 unreachable - need to frag (mtu 1500), length 176 14:07:09.803605 IP 192.168.1.1 > 192.168.1.4: ICMP 10.78.28.17 unreachable - need to frag (mtu 1500), length 176 14:07:09.806829 IP 192.168.1.1 > 192.168.1.4: ICMP 10.78.28.17 unreachable - need to frag (mtu 1500), length 176 14:07:09.806856 IP 192.168.1.1 > 192.168.1.4: ICMP 10.78.28.17 unreachable - need to frag (mtu 1500), length 176 14:07:09.810143 IP 192.168.1.1 > 192.168.1.4: ICMP 10.78.28.17 unreachable - need to frag (mtu 1500), length 176 14:07:09.810172 IP 192.168.1.1 > 192.168.1.4: ICMP 10.78.28.17 unreachable - need to frag (mtu 1500), length 176 Using the following DTrace script: dtrace -n 'fbt:kernel:icmp_error:entry { stack(); printf("type: %d code: %d", arg1, arg2);}' 12.2-RELEASE work as expected: ip_forward() call ip_output() which return EMSGSIZE -> generate ICMP unreach needsfrag. 0 53981 icmp_error:entry kernel`ip_forward+0x5c4 kernel`ip_input+0x7a7 kernel`netisr_dispatch_src+0xca kernel`ether_demux+0x138 kernel`ether_nh_input+0x33b kernel`netisr_dispatch_src+0xca kernel`ether_input+0x4b kernel`vtnet_rxq_eof+0x7a5 kernel`vtnet_rx_vq_process+0xb7 kernel`ithread_loop+0x23c kernel`fork_exit+0x7e kernel`0xffffffff81067f6e type: 3 code: 4 0 53981 icmp_error:entry kernel`ip_forward+0x5c4 kernel`ip_input+0x7a7 kernel`netisr_dispatch_src+0xca kernel`ether_demux+0x138 kernel`ether_nh_input+0x33b kernel`netisr_dispatch_src+0xca kernel`ether_input+0x4b kernel`vtnet_rxq_eof+0x7a5 kernel`vtnet_rx_vq_process+0xb7 kernel`ithread_loop+0x23c kernel`fork_exit+0x7e kernel`0xffffffff81067f6e type: 3 code: 4 13-RC4: 0 54326 icmp_error:entry kernel`ip_tryforward+0x730 kernel`ip_input+0x356 kernel`netisr_dispatch_src+0xca kernel`ether_demux+0x148 kernel`ether_nh_input+0x34c kernel`netisr_dispatch_src+0xca kernel`ether_input+0x69 kernel`vtnet_rxq_eof+0x7d4 kernel`vtnet_rx_vq_process+0xb7 kernel`ithread_loop+0x24d kernel`fork_exit+0x7e kernel`0xffffffff810625ae type: 3 code: 4 1 54326 icmp_error:entry kernel`ip_forward+0x9c kernel`ip_input+0x6cc kernel`swi_net+0x12b kernel`ithread_loop+0x24d kernel`fork_exit+0x7e kernel`0xffffffff810625ae type: 3 code: 1 So, As I understand ip_tryforward() trying to generate ICMP needsfrag, but after that generated ICMP ICMP_UNREACH_HOST.
This is very funny: root@GW_13RC4:~ # tcpdump -i lo0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on lo0, link-type NULL (BSD loopback), capture size 262144 bytes 15:32:30.655851 IP localhost > <GW_13RC4 public IP>: ICMP <remote public host> unreachable - need to frag (mtu 1500), length 576 15:32:30.693492 IP localhost > <GW_13RC4 public IP>: ICMP <remote public host> unreachable - need to frag (mtu 1500), length 576 15:32:30.713231 IP localhost > <GW_13RC4 public IP>: ICMP <remote public host> unreachable - need to frag (mtu 1500), length 576 So, ICMP packets were sent, but from localhost to localhost. It seems that the 12.2-RELEASE checks the packet size before NAT, but the 13-RC4 after.
It looks like PF's behaviour has changed with regard to loopback interfaces. Could this observation[1] be relevant to the breakage reported in this PR? [1] https://lists.freebsd.org/pipermail/freebsd-pf/2021-February/009390.html
For the context, we have switched fastforwarding on by default: https://cgit.freebsd.org/src/commit/?id=8ad114c082a159c0dde95aa35d2e3e108aa30a75 In 12.2 the codepath was ip_input() -> ip_forward() -> ip_output(), where ip_forward() created mbuf copy for the purposes of generating various ICMP messages. Fastforward code currently don't do this for performance reasons, except for the redirect usecase. As a result, we use (possibly altered) packet to generate the redirect.