Bug 282605 - panic: tcp_do_segment: sent too much
Summary: panic: tcp_do_segment: sent too much
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 15.0-CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-transport maling list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-11-07 12:27 UTC by Daniel
Modified: 2024-11-14 18:00 UTC (History)
6 users (show)

See Also:


Attachments
packetdrill reproducer (3.27 KB, text/plain)
2024-11-07 18:45 UTC, Michael Tuexen
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel 2024-11-07 12:27:20 UTC
Sending stress TCP with offloaded NAT-T configuration 

panic: tcp_do_segment: sent too much

BT
#0  __curthread () at /usr/kernel_git/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=textdump@entry=1) at /usr/kernel_git/sys/kern/kern_shutdown.c:404
#2  0xffffffff80b3b7a0 in kern_reboot (howto=260) at /usr/kernel_git/sys/kern/kern_shutdown.c:524
#3  0xffffffff80b3bcd7 in vpanic (fmt=0xffffffff81172d6e "%s: sent too much", ap=ap@entry=0xfffffe010a82db20) at /usr/kernel_git/sys/kern/kern_shutdown.c:979
#4  0xffffffff80b3bb03 in panic (fmt=<unavailable>) at /usr/kernel_git/sys/kern/kern_shutdown.c:892
#5  0xffffffff80d34372 in tcp_do_segment (tp=0xfffff803402baa80, tp@entry=<error reading variable: value is not available>, m=<optimized out>, 
    m@entry=<error reading variable: value is not available>, th=0xfffff802fd295e84, th@entry=<error reading variable: value is not available>, drop_hdrlen=64, 
    drop_hdrlen@entry=<error reading variable: value is not available>, tlen=0, tlen@entry=<error reading variable: value is not available>, iptos=<unavailable>, 
    iptos@entry=<error reading variable: value is not available>) at /usr/kernel_git/sys/netinet/tcp_input.c:1548
#6  0xffffffff80d30d98 in tcp_input_with_port (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>, port=port@entry=0) at /usr/kernel_git/sys/netinet/tcp_input.c:1158
#7  0xffffffff80d31a5b in tcp_input (mp=<unavailable>, offp=<unavailable>, proto=<unavailable>) at /usr/kernel_git/sys/netinet/tcp_input.c:1502
#8  0xffffffff80d1e3af in ip_input (m=0x0, m@entry=<error reading variable: value is not available>) at /usr/kernel_git/sys/netinet/ip_input.c:857
#9  0xffffffff80c98b7b in netisr_process_workstream_proto (nwsp=0xffffffff823c0a00, proto=1) at /usr/kernel_git/sys/net/netisr.c:927
#10 swi_net (arg=0xffffffff823c0a00) at /usr/kernel_git/sys/net/netisr.c:974
#11 0xffffffff80af3b56 in intr_event_execute_handlers (ie=0xfffff800035ee600, p=<optimized out>) at /usr/kernel_git/sys/kern/kern_intr.c:1183
#12 ithread_execute_handlers (ie=0xfffff800035ee600, p=<optimized out>) at /usr/kernel_git/sys/kern/kern_intr.c:1196
#13 ithread_loop (arg=arg@entry=0xfffff800033cdf60) at /usr/kernel_git/sys/kern/kern_intr.c:1289
#14 0xffffffff80aeff52 in fork_exit (callout=0xffffffff80af38f0 <ithread_loop>, arg=0xfffff800033cdf60, frame=0xfffffe010a82df40) at /usr/kernel_git/sys/kern/kern_fork.c:1151
#15 <signal handler called>

SHA ID of the kernel:
2ce493e1693b55a330079ac5fce8beb66e26ddeb
Comment 1 Mark Johnston freebsd_committer freebsd_triage 2024-11-07 18:43:06 UTC
There have been a couple of TCP commits since your revision - is it possible that this is already fixed?
Comment 2 Michael Tuexen freebsd_committer freebsd_triage 2024-11-07 18:45:46 UTC
Created attachment 255012 [details]
packetdrill reproducer
Comment 3 Michael Tuexen freebsd_committer freebsd_triage 2024-11-07 18:48:31 UTC
(In reply to Mark Johnston from comment #1)
It is not yet fixed. I attached a packetdrill reproducer. I am right now testing review D47474 which seems to avoid triggering the relevant KASSERT so far. The reproducer also does not trigger the KASSERT anymore with review D47474. But it is not fully understood, what is going on.
Comment 4 Richard Scheffenegger freebsd_committer freebsd_triage 2024-11-08 22:09:56 UTC
Agree, D47474 masks the problem, which appears to be an interaction between the new tcp transmission selection code, and the combination of RTO followed by a SACK loss recovery in close succession.
Comment 5 Alexander Leidinger freebsd_committer freebsd_triage 2024-11-11 20:34:45 UTC
I just run into this panic, on the ipv6 side I would guess (current as of 2024-10-30-120714):
---snip---
[365136] panic: tcp_do_segment: sent too much
[365136] cpuid = 1
[365136] time = 1731354815
[365136] KDB: stack backtrace:
[365136] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe04314f7790
[365136] vpanic() at vpanic+0x136/frame 0xfffffe04314f78c0
[365136] panic() at panic+0x43/frame 0xfffffe04314f7920
[365136] tcp_do_segment() at tcp_do_segment+0x2852/frame 0xfffffe04314f79f0
[365136] tcp_input_with_port() at tcp_input_with_port+0x10e2/frame 0xfffffe04314f7b40
[365136] tcp6_input_with_port() at tcp6_input_with_port+0x6a/frame 0xfffffe04314f7b70
[365136] tcp6_input() at tcp6_input+0xb/frame 0xfffffe04314f7b80
[365136] ip6_input() at ip6_input+0xc76/frame 0xfffffe04314f7c60
[365136] netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe04314f7cc0
[365136] ether_demux() at ether_demux+0x16a/frame 0xfffffe04314f7cf0
[365136] ether_nh_input() at ether_nh_input+0x3cf/frame 0xfffffe04314f7d40
[365136] netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe04314f7da0
[365136] ether_input() at ether_input+0xd5/frame 0xfffffe04314f7e00
[365136] epair_tx_start_deferred() at epair_tx_start_deferred+0xd4/frame 0xfffffe04314f7e40
[365136] taskqueue_run_locked() at taskqueue_run_locked+0x1c7/frame 0xfffffe04314f7ec0
[365136] taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe04314f7ef0
[365136] fork_exit() at fork_exit+0x87/frame 0xfffffe04314f7f30
[365136] fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe04314f7f30
[365136] --- trap 0x3de64570, rip = 0, rsp = 0, rbp = 0 ---
[365136] Uptime: 4d5h25m36s
[365136] Dumping 50824 out of 73621 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
---snip--

Parts of the crashdump output (full output available on request):
---snip---
#5  0xffffffff806b6b72 in tcp_do_segment (tp=0xfffff80da7abfa80,
    m=<optimized out>, th=0xfffff804c5190a96, drop_hdrlen=72, tlen=0,
    iptos=<optimized out>)
    at /space/system/usr_src/sys/netinet/tcp_input.c:1548
        to = {to_flags = 128, to_tsval = 4294965249, to_tsecr = 4546,
          to_sacks = 0xfffff804c5190aae "\aa\217\345\aa\225}\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255", <incomplete sequence \336>,
          to_signature = 0x4 <error: Cannot access memory at address 0x4>,
          to_tfo_cookie = 0xfffffe04314f79c0 "\340yO1", to_mss = 22215,
          to_wscale = 111 'o', to_nsacks = 1 '\001', to_tfo_len = 255 '\377',
          to_spare = 2154633376}
        maxseg = 1432
        inp = 0xfffff80da7abfa80
        needoutput = 0
        incforsyn = <optimized out>
        so = 0xfffff8051a0ffc00
        inc = <optimized out>
        thflags = <optimized out>
        sack_changed = <optimized out>
        nsegs = 1
        s = <optimized out>
        tiwin = <optimized out>
        rstreason = <optimized out>
        todrop = <optimized out>
        acked = <optimized out>
        tfo_syn = <optimized out>
        mfree = <optimized out>
        ourfinisacked = <optimized out>
        win = <optimized out>
        close = <optimized out>
#6  0xffffffff806b3602 in tcp_input_with_port (mp=mp@entry=0xfffffe04314f7bc8,
    offp=offp@entry=0xfffffe04314f7bc0, proto=<optimized out>, port=0)
    at /space/system/usr_src/sys/netinet/tcp_input.c:1158
        so = 0xfffff8051a0ffc00
        to = {to_flags = 0, to_tsval = 0, to_tsecr = 719386432,
          to_sacks = 0xfffff80727e1a038 "\001",
          to_signature = 0xfffff80727e1a084 "",
          to_tfo_cookie = 0x5bbafa8e00000073 <error: Cannot access memory at address 0x5bbafa8e00000073>, to_mss = 37456, to_wscale = 5 '\005',
          to_nsacks = 0 '\000', to_tfo_len = 0 '\000', to_spare = 590855}
        m = 0xfffff804c5190a00
        th = 0xfffff804c5190a96
        ip = 0x0
        inp = <optimized out>
        tp = <unavailable>
        optp = 0xfffff804c5190aaa "\001\001\005\n\aa\217\345\aa\225}\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255", <incomplete sequence \336>
        optlen = 12
        tlen = <optimized out>
        rstreason = <optimized out>
        fwd_tag = 0x0
        ip6 = 0xfffff804c5190a6e
        s = 0x0
        off0 = <optimized out>
        iptos = 0 '\000'
        off = <optimized out>
        len = <optimized out>
        ipttl = <optimized out>
        thflags = <optimized out>
        drop_hdrlen = 72
        lookupflag = <optimized out>
        isipv6 = <optimized out>
#7  0xffffffff806b247a in tcp6_input_with_port (mp=0xfffffe04314f7bc8,
    offp=0xfffffe04314f7bc0, proto=<optimized out>, port=port@entry=0)
    at /space/system/usr_src/sys/netinet/tcp_input.c:594
        m = 0xfffff804c5190a00
        ip6 = <optimized out>
        ia6 = <unavailable>
#8  0xffffffff806b3d5b in tcp6_input (mp=<unavailable>, offp=<unavailable>,
    proto=<unavailable>) at /space/system/usr_src/sys/netinet/tcp_input.c:601
No locals.
---snip---
Comment 6 commit-hook freebsd_committer freebsd_triage 2024-11-14 18:00:43 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=8f5a2e216f4cb955150c8f88ab21eaecc5adc8b9

commit 8f5a2e216f4cb955150c8f88ab21eaecc5adc8b9
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2024-11-14 08:19:34 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2024-11-14 08:19:49 +0000

    tcp: fix cwnd recalculation during limited transmit

    Properly calculate the expected flight size (cwnd) during
    limited transmit. Exclude the SACK scoreboard from
    consideration when still in limited transmit.

    PR: 282605
    Reviewed By: tuexen, #transport
    Sponsored by: NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D47541

 sys/netinet/tcp_input.c  | 2 +-
 sys/netinet/tcp_output.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)