Bug 255432 - pf fragment reassembly leads to invalid IP checksum since 13.0-RELEASE
Summary: pf fragment reassembly leads to invalid IP checksum since 13.0-RELEASE
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Kristof Provost
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-04-26 19:48 UTC by topical
Modified: 2021-05-07 16:08 UTC (History)
1 user (show)

See Also:


Attachments
pcap on sender (2.46 KB, application/octet-stream)
2021-04-26 20:42 UTC, topical
no flags Details
pcap on receiver (2.40 KB, application/octet-stream)
2021-04-26 20:42 UTC, topical
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description topical 2021-04-26 19:48:25 UTC
Scenario: I have 2 lagg interfaces: one with mtu 1500 another with mtu 9000. Both have several VLANs.

Fragmented UDP packets are received by VLAN of first interface, reassembled by pf ("scrub in all fragment reassemble"), and then sent out as jumbo frame by VLAN of second interface.

All networks cards are mellanox ("mce"). All H/W offload features but LRO are enabled.

Before 13.0 everything worked fine. Since 13.0, the resulting jumbo frame is sent out with invalid IP checksum and thus discarded by receiver.

Disabling TXCSUM on outgoing VLAN interface doesn't change anything.

I tried to disable VLAN_HWCSUM on outgoing LAGG interface, but this seems to be impossible without reboot - ifconfig still shows this feature as enabled. As this is a production system, rebooting for further tests is not an option. 

But: if I decrease mtu of outgoing VLAN interface to 1500, the packet gets fragmented again and IP checksums are correct. 

I know that there are (cheap) network cards with broken/limited H/W offload support and thus you should disable all these features. But AFAIK mellanox cards are not affected by this and actually they worked until 13.0
Comment 1 Kristof Provost freebsd_committer 2021-04-26 19:57:32 UTC
 * Does this affect only UDP or also ICMP and TCP?
 * Please attach a capture of a single fragmented UDP packet, captured both inbound and outbound (ideally on a remote host, to account for checksum offload), demonstrating the problem.
Comment 2 topical 2021-04-26 20:00:47 UTC
Please give me a minute: as this is a production system, I need to create a test setup to avoid service outage.
Comment 3 topical 2021-04-26 20:28:10 UTC
Executing

  ping -4 -s 2000 ns1

Captured with

  tshark -V -ni vtnet0 -o ip.check_checksum:TRUE

On sender (mtu 1500):

Frame 1: 1514 bytes on wire (12112 bits), 1514 bytes captured (12112 bits) on interface vtnet0, id 0
<snip>
Internet Protocol Version 4, Src: 10.1.11.2, Dst: 10.1.2.5
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
        0000 00.. = Differentiated Services Codepoint: Default (0)
        .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
    Total Length: 1500
    Identification: 0x7bed (31725)
    Flags: 0x20, More fragments
        0... .... = Reserved bit: Not set
        .0.. .... = Don't fragment: Not set
        ..1. .... = More fragments: Set
    Fragment Offset: 0
    Time to Live: 64
    Protocol: ICMP (1)
    Header Checksum: 0xb82b [correct]
    [Header checksum status: Good]
    [Calculated Checksum: 0xb82b]
    Source Address: 10.1.11.2
    Destination Address: 10.1.2.5
Data (1480 bytes)
<snip>
    Data: 0800f36266e6000000101623267d28ee08090a0b0c0d0e0f101112131415161718191a1b…
    [Length: 1480]

Frame 2: 562 bytes on wire (4496 bits), 562 bytes captured (4496 bits) on interface vtnet0, id 0
<snip>
Internet Protocol Version 4, Src: 10.1.11.2, Dst: 10.1.2.5
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
        0000 00.. = Differentiated Services Codepoint: Default (0)
        .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
    Total Length: 548
    Identification: 0x7bed (31725)
    Flags: 0x00
        0... .... = Reserved bit: Not set
        .0.. .... = Don't fragment: Not set
        ..0. .... = More fragments: Not set
    Fragment Offset: 1480
    Time to Live: 64
    Protocol: ICMP (1)
    Header Checksum: 0xdb2a [correct]
    [Header checksum status: Good]
    [Calculated Checksum: 0xdb2a]
    Source Address: 10.1.11.2
    Destination Address: 10.1.2.5
    [2 IPv4 Fragments (2008 bytes): #1(1480), #2(528)]
        [Frame: 1, payload: 0-1479 (1480 bytes)]
        [Frame: 2, payload: 1480-2007 (528 bytes)]
        [Fragment count: 2]
        [Reassembled IPv4 length: 2008]
        [Reassembled IPv4 data: 0800f36266e6000000101623267d28ee08090a0b0c0d0e0f101112131415161718191a1b…]
Internet Control Message Protocol
    Type: 8 (Echo (ping) request)
    Code: 0
    Checksum: 0xf362 [correct]
    [Checksum Status: Good]
    Identifier (BE): 26342 (0x66e6)
    Identifier (LE): 58982 (0xe666)
    Sequence Number (BE): 0 (0x0000)
    Sequence Number (LE): 0 (0x0000)
    Data (2000 bytes)
        Data: 00101623267d28ee08090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20212223…
        [Length: 2000]

On receiver (mtu 9000):

Frame 1: 2042 bytes on wire (16336 bits), 2042 bytes captured (16336 bits) on interface e0a_ns1, id 0
Internet Protocol Version 4, Src: 10.1.11.2, Dst: 10.1.2.5
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
        0000 00.. = Differentiated Services Codepoint: Default (0)
        .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
    Total Length: 2028
    Identification: 0x7bed (31725)
    Flags: 0x00
        0... .... = Reserved bit: Not set
        .0.. .... = Don't fragment: Not set
        ..0. .... = More fragments: Not set
    Fragment Offset: 0
    Time to Live: 63
    Protocol: ICMP (1)
    Header Checksum: 0xb92b incorrect, should be 0xd71b(may be caused by "IP checksum offload"?)
        [Expert Info (Error/Checksum): Bad checksum [should be 0xd71b]]
            [Bad checksum [should be 0xd71b]]
            [Severity level: Error]
            [Group: Checksum]
    [Header checksum status: Bad]
    [Calculated Checksum: 0xd71b]
    Source Address: 10.1.11.2
    Destination Address: 10.1.2.5
Internet Control Message Protocol
    Type: 8 (Echo (ping) request)
    Code: 0
    Checksum: 0xf362 [correct]
    [Checksum Status: Good]
    Identifier (BE): 26342 (0x66e6)
    Identifier (LE): 58982 (0xe666)
    Sequence Number (BE): 0 (0x0000)
    Sequence Number (LE): 0 (0x0000)
    Data (2000 bytes)
<snip>
        Data: 00101623267d28ee08090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20212223…
        [Length: 2000]

You see: packet is reassembled and IP checksum is broken.

Interestingly, if I increase packets size to 20000 (i.e. it needs to be fragmented again), checksum is correct.
Comment 4 Kristof Provost freebsd_committer 2021-04-26 20:33:54 UTC
Can you attach the pcaps (with full-length packet captures)? Those are generally required for checksum issues, so we can compute the expected checksum and compare to the actual one. Sometimes that's a valuable hint as to where the corruption comes from.

I'll try to reproduce this on my own hardware in the next few days. 

I can only remember one checksum-related commit that'd be new in 13, and that's https://cgit.freebsd.org/src/commit/?id=c3f69af03ae7acc167cc1151f0c1ecc5e014ce4e
I wouldn't expect it to result in this, but I'll have to reproduce it first to debug it anyway.
Comment 5 topical 2021-04-26 20:42:09 UTC
Created attachment 224456 [details]
pcap on sender
Comment 6 topical 2021-04-26 20:42:31 UTC
Created attachment 224457 [details]
pcap on receiver
Comment 7 topical 2021-04-26 20:43:17 UTC
full pcap for ping -4 -s 2000
Comment 8 Kristof Provost freebsd_committer 2021-04-27 18:51:08 UTC
I was going to ask you to confirm that it's really the pf scrub code that does this, but I've got a test case using epair that appears to demonstrate the problem as well.

So the good news is that it is trivially reproducible, and that we'll have a test case to prevent regressions when we fix this.

Test case: https://reviews.freebsd.org/D30013

I'm hopeful that I'll be able to dig into the actual cause tomorrow.
Comment 9 topical 2021-04-28 10:22:49 UTC
Great! Having a simple test case is the best place to start.

I keep fingers crossed that you will find the code to blame.
Comment 10 Kristof Provost freebsd_committer 2021-04-28 12:59:22 UTC
Fix in https://reviews.freebsd.org/D30026

I'm somewhat surprised this setup ever worked.
Comment 11 commit-hook freebsd_committer 2021-04-30 08:46:02 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=388c0cde10293d9a3434e99146bf391aec6878a3

commit 388c0cde10293d9a3434e99146bf391aec6878a3
Author:     Kristof Provost <kp@FreeBSD.org>
AuthorDate: 2021-04-27 16:46:03 +0000
Commit:     Kristof Provost <kp@FreeBSD.org>
CommitDate: 2021-04-30 06:19:47 +0000

    pf tests: Test scrub fragment reassemble on interfaces with different MTU

    There's a problem with pf's reassembly code where it produces incorrect
    checksums when reassembling across interfaces with different MTUs.
    Test this.

    PR:             255432
    Reviewed by:    donner
    MFC after:      1 week
    Sponsored by:   Rubicon Communications, LLC ("Netgate")
    Differential Revision:  https://reviews.freebsd.org/D30013

 tests/sys/netpfil/pf/fragmentation.sh | 49 +++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)
Comment 12 commit-hook freebsd_committer 2021-04-30 08:46:03 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=055c55abefbe19fe46a56894595af9c9dad7678c

commit 055c55abefbe19fe46a56894595af9c9dad7678c
Author:     Kristof Provost <kp@FreeBSD.org>
AuthorDate: 2021-04-28 10:56:06 +0000
Commit:     Kristof Provost <kp@FreeBSD.org>
CommitDate: 2021-04-30 06:19:46 +0000

    pf: Fix IP checksum on reassembly

    If we reassemble a packet we modify the IP header (to set the length and
    remove the fragment offset information), but we failed to update the
    checksum. On certain setups (mostly where we did not re-fragment again
    afterwards) this could lead to us sending out packets with incorrect
    checksums.

    PR:             255432
    MFC after:      1 week
    Sponsored by:   Rubicon Communications, LLC ("Netgate")
    Differential Revision:  https://reviews.freebsd.org/D30026

 sys/netpfil/pf/pf_norm.c | 4 ++++
 1 file changed, 4 insertions(+)
Comment 13 commit-hook freebsd_committer 2021-05-07 15:26:55 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=3f9591da1863acb55d3b11b7d0619a3aa70fcb7c

commit 3f9591da1863acb55d3b11b7d0619a3aa70fcb7c
Author:     Kristof Provost <kp@FreeBSD.org>
AuthorDate: 2021-04-27 16:46:03 +0000
Commit:     Kristof Provost <kp@FreeBSD.org>
CommitDate: 2021-05-07 08:16:23 +0000

    pf tests: Test scrub fragment reassemble on interfaces with different MTU

    There's a problem with pf's reassembly code where it produces incorrect
    checksums when reassembling across interfaces with different MTUs.
    Test this.

    PR:             255432
    Reviewed by:    donner
    MFC after:      1 week
    Sponsored by:   Rubicon Communications, LLC ("Netgate")
    Differential Revision:  https://reviews.freebsd.org/D30013

    (cherry picked from commit 388c0cde10293d9a3434e99146bf391aec6878a3)

 tests/sys/netpfil/pf/fragmentation.sh | 49 +++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)
Comment 14 commit-hook freebsd_committer 2021-05-07 15:26:56 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=70e8fe5eee7c35a2c7ce988d402d84a7a9901818

commit 70e8fe5eee7c35a2c7ce988d402d84a7a9901818
Author:     Kristof Provost <kp@FreeBSD.org>
AuthorDate: 2021-04-28 10:56:06 +0000
Commit:     Kristof Provost <kp@FreeBSD.org>
CommitDate: 2021-05-07 08:16:02 +0000

    pf: Fix IP checksum on reassembly

    If we reassemble a packet we modify the IP header (to set the length and
    remove the fragment offset information), but we failed to update the
    checksum. On certain setups (mostly where we did not re-fragment again
    afterwards) this could lead to us sending out packets with incorrect
    checksums.

    PR:             255432
    MFC after:      1 week
    Sponsored by:   Rubicon Communications, LLC ("Netgate")
    Differential Revision:  https://reviews.freebsd.org/D30026

    (cherry picked from commit 055c55abefbe19fe46a56894595af9c9dad7678c)

 sys/netpfil/pf/pf_norm.c | 4 ++++
 1 file changed, 4 insertions(+)
Comment 15 commit-hook freebsd_committer 2021-05-07 15:26:56 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=af1b05bb32b5440dd999853bd7c01a5b8c0d73f4

commit af1b05bb32b5440dd999853bd7c01a5b8c0d73f4
Author:     Kristof Provost <kp@FreeBSD.org>
AuthorDate: 2021-04-28 10:56:06 +0000
Commit:     Kristof Provost <kp@FreeBSD.org>
CommitDate: 2021-05-07 08:17:50 +0000

    pf: Fix IP checksum on reassembly

    If we reassemble a packet we modify the IP header (to set the length and
    remove the fragment offset information), but we failed to update the
    checksum. On certain setups (mostly where we did not re-fragment again
    afterwards) this could lead to us sending out packets with incorrect
    checksums.

    PR:             255432
    MFC after:      1 week
    Sponsored by:   Rubicon Communications, LLC ("Netgate")
    Differential Revision:  https://reviews.freebsd.org/D30026

    (cherry picked from commit 055c55abefbe19fe46a56894595af9c9dad7678c)

 sys/netpfil/pf/pf_norm.c | 4 ++++
 1 file changed, 4 insertions(+)
Comment 16 commit-hook freebsd_committer 2021-05-07 15:26:57 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=becdc0c0dc7604389dd06b682ab11ed4dc43afea

commit becdc0c0dc7604389dd06b682ab11ed4dc43afea
Author:     Kristof Provost <kp@FreeBSD.org>
AuthorDate: 2021-04-27 16:46:03 +0000
Commit:     Kristof Provost <kp@FreeBSD.org>
CommitDate: 2021-05-07 08:19:05 +0000

    pf tests: Test scrub fragment reassemble on interfaces with different MTU

    There's a problem with pf's reassembly code where it produces incorrect
    checksums when reassembling across interfaces with different MTUs.
    Test this.

    PR:             255432
    Reviewed by:    donner
    MFC after:      1 week
    Sponsored by:   Rubicon Communications, LLC ("Netgate")
    Differential Revision:  https://reviews.freebsd.org/D30013

    (cherry picked from commit 388c0cde10293d9a3434e99146bf391aec6878a3)

 tests/sys/netpfil/pf/fragmentation.sh | 49 +++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)