Scenario: I have 2 lagg interfaces: one with mtu 1500 another with mtu 9000. Both have several VLANs. Fragmented UDP packets are received by VLAN of first interface, reassembled by pf ("scrub in all fragment reassemble"), and then sent out as jumbo frame by VLAN of second interface. All networks cards are mellanox ("mce"). All H/W offload features but LRO are enabled. Before 13.0 everything worked fine. Since 13.0, the resulting jumbo frame is sent out with invalid IP checksum and thus discarded by receiver. Disabling TXCSUM on outgoing VLAN interface doesn't change anything. I tried to disable VLAN_HWCSUM on outgoing LAGG interface, but this seems to be impossible without reboot - ifconfig still shows this feature as enabled. As this is a production system, rebooting for further tests is not an option. But: if I decrease mtu of outgoing VLAN interface to 1500, the packet gets fragmented again and IP checksums are correct. I know that there are (cheap) network cards with broken/limited H/W offload support and thus you should disable all these features. But AFAIK mellanox cards are not affected by this and actually they worked until 13.0
* Does this affect only UDP or also ICMP and TCP? * Please attach a capture of a single fragmented UDP packet, captured both inbound and outbound (ideally on a remote host, to account for checksum offload), demonstrating the problem.
Please give me a minute: as this is a production system, I need to create a test setup to avoid service outage.
Executing ping -4 -s 2000 ns1 Captured with tshark -V -ni vtnet0 -o ip.check_checksum:TRUE On sender (mtu 1500): Frame 1: 1514 bytes on wire (12112 bits), 1514 bytes captured (12112 bits) on interface vtnet0, id 0 <snip> Internet Protocol Version 4, Src: 10.1.11.2, Dst: 10.1.2.5 0100 .... = Version: 4 .... 0101 = Header Length: 20 bytes (5) Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT) 0000 00.. = Differentiated Services Codepoint: Default (0) .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0) Total Length: 1500 Identification: 0x7bed (31725) Flags: 0x20, More fragments 0... .... = Reserved bit: Not set .0.. .... = Don't fragment: Not set ..1. .... = More fragments: Set Fragment Offset: 0 Time to Live: 64 Protocol: ICMP (1) Header Checksum: 0xb82b [correct] [Header checksum status: Good] [Calculated Checksum: 0xb82b] Source Address: 10.1.11.2 Destination Address: 10.1.2.5 Data (1480 bytes) <snip> Data: 0800f36266e6000000101623267d28ee08090a0b0c0d0e0f101112131415161718191a1b… [Length: 1480] Frame 2: 562 bytes on wire (4496 bits), 562 bytes captured (4496 bits) on interface vtnet0, id 0 <snip> Internet Protocol Version 4, Src: 10.1.11.2, Dst: 10.1.2.5 0100 .... = Version: 4 .... 0101 = Header Length: 20 bytes (5) Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT) 0000 00.. = Differentiated Services Codepoint: Default (0) .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0) Total Length: 548 Identification: 0x7bed (31725) Flags: 0x00 0... .... = Reserved bit: Not set .0.. .... = Don't fragment: Not set ..0. .... = More fragments: Not set Fragment Offset: 1480 Time to Live: 64 Protocol: ICMP (1) Header Checksum: 0xdb2a [correct] [Header checksum status: Good] [Calculated Checksum: 0xdb2a] Source Address: 10.1.11.2 Destination Address: 10.1.2.5 [2 IPv4 Fragments (2008 bytes): #1(1480), #2(528)] [Frame: 1, payload: 0-1479 (1480 bytes)] [Frame: 2, payload: 1480-2007 (528 bytes)] [Fragment count: 2] [Reassembled IPv4 length: 2008] [Reassembled IPv4 data: 0800f36266e6000000101623267d28ee08090a0b0c0d0e0f101112131415161718191a1b…] Internet Control Message Protocol Type: 8 (Echo (ping) request) Code: 0 Checksum: 0xf362 [correct] [Checksum Status: Good] Identifier (BE): 26342 (0x66e6) Identifier (LE): 58982 (0xe666) Sequence Number (BE): 0 (0x0000) Sequence Number (LE): 0 (0x0000) Data (2000 bytes) Data: 00101623267d28ee08090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20212223… [Length: 2000] On receiver (mtu 9000): Frame 1: 2042 bytes on wire (16336 bits), 2042 bytes captured (16336 bits) on interface e0a_ns1, id 0 Internet Protocol Version 4, Src: 10.1.11.2, Dst: 10.1.2.5 0100 .... = Version: 4 .... 0101 = Header Length: 20 bytes (5) Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT) 0000 00.. = Differentiated Services Codepoint: Default (0) .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0) Total Length: 2028 Identification: 0x7bed (31725) Flags: 0x00 0... .... = Reserved bit: Not set .0.. .... = Don't fragment: Not set ..0. .... = More fragments: Not set Fragment Offset: 0 Time to Live: 63 Protocol: ICMP (1) Header Checksum: 0xb92b incorrect, should be 0xd71b(may be caused by "IP checksum offload"?) [Expert Info (Error/Checksum): Bad checksum [should be 0xd71b]] [Bad checksum [should be 0xd71b]] [Severity level: Error] [Group: Checksum] [Header checksum status: Bad] [Calculated Checksum: 0xd71b] Source Address: 10.1.11.2 Destination Address: 10.1.2.5 Internet Control Message Protocol Type: 8 (Echo (ping) request) Code: 0 Checksum: 0xf362 [correct] [Checksum Status: Good] Identifier (BE): 26342 (0x66e6) Identifier (LE): 58982 (0xe666) Sequence Number (BE): 0 (0x0000) Sequence Number (LE): 0 (0x0000) Data (2000 bytes) <snip> Data: 00101623267d28ee08090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20212223… [Length: 2000] You see: packet is reassembled and IP checksum is broken. Interestingly, if I increase packets size to 20000 (i.e. it needs to be fragmented again), checksum is correct.
Can you attach the pcaps (with full-length packet captures)? Those are generally required for checksum issues, so we can compute the expected checksum and compare to the actual one. Sometimes that's a valuable hint as to where the corruption comes from. I'll try to reproduce this on my own hardware in the next few days. I can only remember one checksum-related commit that'd be new in 13, and that's https://cgit.freebsd.org/src/commit/?id=c3f69af03ae7acc167cc1151f0c1ecc5e014ce4e I wouldn't expect it to result in this, but I'll have to reproduce it first to debug it anyway.
Created attachment 224456 [details] pcap on sender
Created attachment 224457 [details] pcap on receiver
full pcap for ping -4 -s 2000
I was going to ask you to confirm that it's really the pf scrub code that does this, but I've got a test case using epair that appears to demonstrate the problem as well. So the good news is that it is trivially reproducible, and that we'll have a test case to prevent regressions when we fix this. Test case: https://reviews.freebsd.org/D30013 I'm hopeful that I'll be able to dig into the actual cause tomorrow.
Great! Having a simple test case is the best place to start. I keep fingers crossed that you will find the code to blame.
Fix in https://reviews.freebsd.org/D30026 I'm somewhat surprised this setup ever worked.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=388c0cde10293d9a3434e99146bf391aec6878a3 commit 388c0cde10293d9a3434e99146bf391aec6878a3 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2021-04-27 16:46:03 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2021-04-30 06:19:47 +0000 pf tests: Test scrub fragment reassemble on interfaces with different MTU There's a problem with pf's reassembly code where it produces incorrect checksums when reassembling across interfaces with different MTUs. Test this. PR: 255432 Reviewed by: donner MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D30013 tests/sys/netpfil/pf/fragmentation.sh | 49 +++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=055c55abefbe19fe46a56894595af9c9dad7678c commit 055c55abefbe19fe46a56894595af9c9dad7678c Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2021-04-28 10:56:06 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2021-04-30 06:19:46 +0000 pf: Fix IP checksum on reassembly If we reassemble a packet we modify the IP header (to set the length and remove the fragment offset information), but we failed to update the checksum. On certain setups (mostly where we did not re-fragment again afterwards) this could lead to us sending out packets with incorrect checksums. PR: 255432 MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D30026 sys/netpfil/pf/pf_norm.c | 4 ++++ 1 file changed, 4 insertions(+)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=3f9591da1863acb55d3b11b7d0619a3aa70fcb7c commit 3f9591da1863acb55d3b11b7d0619a3aa70fcb7c Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2021-04-27 16:46:03 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2021-05-07 08:16:23 +0000 pf tests: Test scrub fragment reassemble on interfaces with different MTU There's a problem with pf's reassembly code where it produces incorrect checksums when reassembling across interfaces with different MTUs. Test this. PR: 255432 Reviewed by: donner MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D30013 (cherry picked from commit 388c0cde10293d9a3434e99146bf391aec6878a3) tests/sys/netpfil/pf/fragmentation.sh | 49 +++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=70e8fe5eee7c35a2c7ce988d402d84a7a9901818 commit 70e8fe5eee7c35a2c7ce988d402d84a7a9901818 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2021-04-28 10:56:06 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2021-05-07 08:16:02 +0000 pf: Fix IP checksum on reassembly If we reassemble a packet we modify the IP header (to set the length and remove the fragment offset information), but we failed to update the checksum. On certain setups (mostly where we did not re-fragment again afterwards) this could lead to us sending out packets with incorrect checksums. PR: 255432 MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D30026 (cherry picked from commit 055c55abefbe19fe46a56894595af9c9dad7678c) sys/netpfil/pf/pf_norm.c | 4 ++++ 1 file changed, 4 insertions(+)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=af1b05bb32b5440dd999853bd7c01a5b8c0d73f4 commit af1b05bb32b5440dd999853bd7c01a5b8c0d73f4 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2021-04-28 10:56:06 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2021-05-07 08:17:50 +0000 pf: Fix IP checksum on reassembly If we reassemble a packet we modify the IP header (to set the length and remove the fragment offset information), but we failed to update the checksum. On certain setups (mostly where we did not re-fragment again afterwards) this could lead to us sending out packets with incorrect checksums. PR: 255432 MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D30026 (cherry picked from commit 055c55abefbe19fe46a56894595af9c9dad7678c) sys/netpfil/pf/pf_norm.c | 4 ++++ 1 file changed, 4 insertions(+)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=becdc0c0dc7604389dd06b682ab11ed4dc43afea commit becdc0c0dc7604389dd06b682ab11ed4dc43afea Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2021-04-27 16:46:03 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2021-05-07 08:19:05 +0000 pf tests: Test scrub fragment reassemble on interfaces with different MTU There's a problem with pf's reassembly code where it produces incorrect checksums when reassembling across interfaces with different MTUs. Test this. PR: 255432 Reviewed by: donner MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D30013 (cherry picked from commit 388c0cde10293d9a3434e99146bf391aec6878a3) tests/sys/netpfil/pf/fragmentation.sh | 49 +++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+)