Sending stress TCP with offloaded NAT-T configuration panic: tcp_do_segment: sent too much BT #0 __curthread () at /usr/kernel_git/sys/amd64/include/pcpu_aux.h:57 #1 doadump (textdump=textdump@entry=1) at /usr/kernel_git/sys/kern/kern_shutdown.c:404 #2 0xffffffff80b3b7a0 in kern_reboot (howto=260) at /usr/kernel_git/sys/kern/kern_shutdown.c:524 #3 0xffffffff80b3bcd7 in vpanic (fmt=0xffffffff81172d6e "%s: sent too much", ap=ap@entry=0xfffffe010a82db20) at /usr/kernel_git/sys/kern/kern_shutdown.c:979 #4 0xffffffff80b3bb03 in panic (fmt=<unavailable>) at /usr/kernel_git/sys/kern/kern_shutdown.c:892 #5 0xffffffff80d34372 in tcp_do_segment (tp=0xfffff803402baa80, tp@entry=<error reading variable: value is not available>, m=<optimized out>, m@entry=<error reading variable: value is not available>, th=0xfffff802fd295e84, th@entry=<error reading variable: value is not available>, drop_hdrlen=64, drop_hdrlen@entry=<error reading variable: value is not available>, tlen=0, tlen@entry=<error reading variable: value is not available>, iptos=<unavailable>, iptos@entry=<error reading variable: value is not available>) at /usr/kernel_git/sys/netinet/tcp_input.c:1548 #6 0xffffffff80d30d98 in tcp_input_with_port (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>, port=port@entry=0) at /usr/kernel_git/sys/netinet/tcp_input.c:1158 #7 0xffffffff80d31a5b in tcp_input (mp=<unavailable>, offp=<unavailable>, proto=<unavailable>) at /usr/kernel_git/sys/netinet/tcp_input.c:1502 #8 0xffffffff80d1e3af in ip_input (m=0x0, m@entry=<error reading variable: value is not available>) at /usr/kernel_git/sys/netinet/ip_input.c:857 #9 0xffffffff80c98b7b in netisr_process_workstream_proto (nwsp=0xffffffff823c0a00, proto=1) at /usr/kernel_git/sys/net/netisr.c:927 #10 swi_net (arg=0xffffffff823c0a00) at /usr/kernel_git/sys/net/netisr.c:974 #11 0xffffffff80af3b56 in intr_event_execute_handlers (ie=0xfffff800035ee600, p=<optimized out>) at /usr/kernel_git/sys/kern/kern_intr.c:1183 #12 ithread_execute_handlers (ie=0xfffff800035ee600, p=<optimized out>) at /usr/kernel_git/sys/kern/kern_intr.c:1196 #13 ithread_loop (arg=arg@entry=0xfffff800033cdf60) at /usr/kernel_git/sys/kern/kern_intr.c:1289 #14 0xffffffff80aeff52 in fork_exit (callout=0xffffffff80af38f0 <ithread_loop>, arg=0xfffff800033cdf60, frame=0xfffffe010a82df40) at /usr/kernel_git/sys/kern/kern_fork.c:1151 #15 <signal handler called> SHA ID of the kernel: 2ce493e1693b55a330079ac5fce8beb66e26ddeb
There have been a couple of TCP commits since your revision - is it possible that this is already fixed?
Created attachment 255012 [details] packetdrill reproducer
(In reply to Mark Johnston from comment #1) It is not yet fixed. I attached a packetdrill reproducer. I am right now testing review D47474 which seems to avoid triggering the relevant KASSERT so far. The reproducer also does not trigger the KASSERT anymore with review D47474. But it is not fully understood, what is going on.
Agree, D47474 masks the problem, which appears to be an interaction between the new tcp transmission selection code, and the combination of RTO followed by a SACK loss recovery in close succession.
I just run into this panic, on the ipv6 side I would guess (current as of 2024-10-30-120714): ---snip--- [365136] panic: tcp_do_segment: sent too much [365136] cpuid = 1 [365136] time = 1731354815 [365136] KDB: stack backtrace: [365136] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe04314f7790 [365136] vpanic() at vpanic+0x136/frame 0xfffffe04314f78c0 [365136] panic() at panic+0x43/frame 0xfffffe04314f7920 [365136] tcp_do_segment() at tcp_do_segment+0x2852/frame 0xfffffe04314f79f0 [365136] tcp_input_with_port() at tcp_input_with_port+0x10e2/frame 0xfffffe04314f7b40 [365136] tcp6_input_with_port() at tcp6_input_with_port+0x6a/frame 0xfffffe04314f7b70 [365136] tcp6_input() at tcp6_input+0xb/frame 0xfffffe04314f7b80 [365136] ip6_input() at ip6_input+0xc76/frame 0xfffffe04314f7c60 [365136] netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe04314f7cc0 [365136] ether_demux() at ether_demux+0x16a/frame 0xfffffe04314f7cf0 [365136] ether_nh_input() at ether_nh_input+0x3cf/frame 0xfffffe04314f7d40 [365136] netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe04314f7da0 [365136] ether_input() at ether_input+0xd5/frame 0xfffffe04314f7e00 [365136] epair_tx_start_deferred() at epair_tx_start_deferred+0xd4/frame 0xfffffe04314f7e40 [365136] taskqueue_run_locked() at taskqueue_run_locked+0x1c7/frame 0xfffffe04314f7ec0 [365136] taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe04314f7ef0 [365136] fork_exit() at fork_exit+0x87/frame 0xfffffe04314f7f30 [365136] fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe04314f7f30 [365136] --- trap 0x3de64570, rip = 0, rsp = 0, rbp = 0 --- [365136] Uptime: 4d5h25m36s [365136] Dumping 50824 out of 73621 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% ---snip-- Parts of the crashdump output (full output available on request): ---snip--- #5 0xffffffff806b6b72 in tcp_do_segment (tp=0xfffff80da7abfa80, m=<optimized out>, th=0xfffff804c5190a96, drop_hdrlen=72, tlen=0, iptos=<optimized out>) at /space/system/usr_src/sys/netinet/tcp_input.c:1548 to = {to_flags = 128, to_tsval = 4294965249, to_tsecr = 4546, to_sacks = 0xfffff804c5190aae "\aa\217\345\aa\225}\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255", <incomplete sequence \336>, to_signature = 0x4 <error: Cannot access memory at address 0x4>, to_tfo_cookie = 0xfffffe04314f79c0 "\340yO1", to_mss = 22215, to_wscale = 111 'o', to_nsacks = 1 '\001', to_tfo_len = 255 '\377', to_spare = 2154633376} maxseg = 1432 inp = 0xfffff80da7abfa80 needoutput = 0 incforsyn = <optimized out> so = 0xfffff8051a0ffc00 inc = <optimized out> thflags = <optimized out> sack_changed = <optimized out> nsegs = 1 s = <optimized out> tiwin = <optimized out> rstreason = <optimized out> todrop = <optimized out> acked = <optimized out> tfo_syn = <optimized out> mfree = <optimized out> ourfinisacked = <optimized out> win = <optimized out> close = <optimized out> #6 0xffffffff806b3602 in tcp_input_with_port (mp=mp@entry=0xfffffe04314f7bc8, offp=offp@entry=0xfffffe04314f7bc0, proto=<optimized out>, port=0) at /space/system/usr_src/sys/netinet/tcp_input.c:1158 so = 0xfffff8051a0ffc00 to = {to_flags = 0, to_tsval = 0, to_tsecr = 719386432, to_sacks = 0xfffff80727e1a038 "\001", to_signature = 0xfffff80727e1a084 "", to_tfo_cookie = 0x5bbafa8e00000073 <error: Cannot access memory at address 0x5bbafa8e00000073>, to_mss = 37456, to_wscale = 5 '\005', to_nsacks = 0 '\000', to_tfo_len = 0 '\000', to_spare = 590855} m = 0xfffff804c5190a00 th = 0xfffff804c5190a96 ip = 0x0 inp = <optimized out> tp = <unavailable> optp = 0xfffff804c5190aaa "\001\001\005\n\aa\217\345\aa\225}\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255", <incomplete sequence \336> optlen = 12 tlen = <optimized out> rstreason = <optimized out> fwd_tag = 0x0 ip6 = 0xfffff804c5190a6e s = 0x0 off0 = <optimized out> iptos = 0 '\000' off = <optimized out> len = <optimized out> ipttl = <optimized out> thflags = <optimized out> drop_hdrlen = 72 lookupflag = <optimized out> isipv6 = <optimized out> #7 0xffffffff806b247a in tcp6_input_with_port (mp=0xfffffe04314f7bc8, offp=0xfffffe04314f7bc0, proto=<optimized out>, port=port@entry=0) at /space/system/usr_src/sys/netinet/tcp_input.c:594 m = 0xfffff804c5190a00 ip6 = <optimized out> ia6 = <unavailable> #8 0xffffffff806b3d5b in tcp6_input (mp=<unavailable>, offp=<unavailable>, proto=<unavailable>) at /space/system/usr_src/sys/netinet/tcp_input.c:601 No locals. ---snip---
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=8f5a2e216f4cb955150c8f88ab21eaecc5adc8b9 commit 8f5a2e216f4cb955150c8f88ab21eaecc5adc8b9 Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2024-11-14 08:19:34 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2024-11-14 08:19:49 +0000 tcp: fix cwnd recalculation during limited transmit Properly calculate the expected flight size (cwnd) during limited transmit. Exclude the SACK scoreboard from consideration when still in limited transmit. PR: 282605 Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D47541 sys/netinet/tcp_input.c | 2 +- sys/netinet/tcp_output.c | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-)