282605 – panic: tcp_do_segment: sent too much

Bug 282605 - panic: tcp_do_segment: sent too much

Summary: panic: tcp_do_segment: sent too much

Status:	Open

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	15.0-CURRENT
Hardware:	amd64 Any

Importance:	--- Affects Some People
Assignee:	Richard Scheffenegger

URL:
Keywords:

Duplicates (1):	284588 (view as bug list)
Depends on:
Blocks:

Reported:	2024-11-07 12:27 UTC by Daniel
Modified:	2025-05-13 15:04 UTC (History)
CC List:	24 users (show)

See Also:

Flags:	linimon: mfc-stable14? linimon: mfc-stable13?

Attachments
packetdrill reproducer (3.27 KB, text/plain) 2024-11-07 18:45 UTC, Michael Tuexen	no flags	Details
packetdrill reproducer for stable/14 (4.11 KB, text/plain) 2025-01-21 15:28 UTC, Michael Tuexen	no flags	Details
patch for testing (561 bytes, patch) 2025-03-16 07:25 UTC, Michael Tuexen	no flags	Details \| Diff
Another reproducer (3.51 KB, text/plain) 2025-04-19 21:32 UTC, Michael Tuexen	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Daniel 2024-11-07 12:27:20 UTC

Sending stress TCP with offloaded NAT-T configuration 

panic: tcp_do_segment: sent too much

BT
#0  __curthread () at /usr/kernel_git/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=textdump@entry=1) at /usr/kernel_git/sys/kern/kern_shutdown.c:404
#2  0xffffffff80b3b7a0 in kern_reboot (howto=260) at /usr/kernel_git/sys/kern/kern_shutdown.c:524
#3  0xffffffff80b3bcd7 in vpanic (fmt=0xffffffff81172d6e "%s: sent too much", ap=ap@entry=0xfffffe010a82db20) at /usr/kernel_git/sys/kern/kern_shutdown.c:979
#4  0xffffffff80b3bb03 in panic (fmt=<unavailable>) at /usr/kernel_git/sys/kern/kern_shutdown.c:892
#5  0xffffffff80d34372 in tcp_do_segment (tp=0xfffff803402baa80, tp@entry=<error reading variable: value is not available>, m=<optimized out>, 
    m@entry=<error reading variable: value is not available>, th=0xfffff802fd295e84, th@entry=<error reading variable: value is not available>, drop_hdrlen=64, 
    drop_hdrlen@entry=<error reading variable: value is not available>, tlen=0, tlen@entry=<error reading variable: value is not available>, iptos=<unavailable>, 
    iptos@entry=<error reading variable: value is not available>) at /usr/kernel_git/sys/netinet/tcp_input.c:1548
#6  0xffffffff80d30d98 in tcp_input_with_port (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>, port=port@entry=0) at /usr/kernel_git/sys/netinet/tcp_input.c:1158
#7  0xffffffff80d31a5b in tcp_input (mp=<unavailable>, offp=<unavailable>, proto=<unavailable>) at /usr/kernel_git/sys/netinet/tcp_input.c:1502
#8  0xffffffff80d1e3af in ip_input (m=0x0, m@entry=<error reading variable: value is not available>) at /usr/kernel_git/sys/netinet/ip_input.c:857
#9  0xffffffff80c98b7b in netisr_process_workstream_proto (nwsp=0xffffffff823c0a00, proto=1) at /usr/kernel_git/sys/net/netisr.c:927
#10 swi_net (arg=0xffffffff823c0a00) at /usr/kernel_git/sys/net/netisr.c:974
#11 0xffffffff80af3b56 in intr_event_execute_handlers (ie=0xfffff800035ee600, p=<optimized out>) at /usr/kernel_git/sys/kern/kern_intr.c:1183
#12 ithread_execute_handlers (ie=0xfffff800035ee600, p=<optimized out>) at /usr/kernel_git/sys/kern/kern_intr.c:1196
#13 ithread_loop (arg=arg@entry=0xfffff800033cdf60) at /usr/kernel_git/sys/kern/kern_intr.c:1289
#14 0xffffffff80aeff52 in fork_exit (callout=0xffffffff80af38f0 <ithread_loop>, arg=0xfffff800033cdf60, frame=0xfffffe010a82df40) at /usr/kernel_git/sys/kern/kern_fork.c:1151
#15 <signal handler called>

SHA ID of the kernel:
2ce493e1693b55a330079ac5fce8beb66e26ddeb

Comment 1 Mark Johnston freebsd_committer

2024-11-07 18:43:06 UTC

There have been a couple of TCP commits since your revision - is it possible that this is already fixed?

Comment 2 Michael Tuexen freebsd_committer

2024-11-07 18:45:46 UTC

Created attachment 255012 [details]
packetdrill reproducer

Comment 3 Michael Tuexen freebsd_committer

2024-11-07 18:48:31 UTC

(In reply to Mark Johnston from comment #1)
It is not yet fixed. I attached a packetdrill reproducer. I am right now testing review D47474 which seems to avoid triggering the relevant KASSERT so far. The reproducer also does not trigger the KASSERT anymore with review D47474. But it is not fully understood, what is going on.

Comment 4 Richard Scheffenegger freebsd_committer

2024-11-08 22:09:56 UTC

Agree, D47474 masks the problem, which appears to be an interaction between the new tcp transmission selection code, and the combination of RTO followed by a SACK loss recovery in close succession.

Comment 5 Alexander Leidinger freebsd_committer

2024-11-11 20:34:45 UTC

I just run into this panic, on the ipv6 side I would guess (current as of 2024-10-30-120714):
---snip---
[365136] panic: tcp_do_segment: sent too much
[365136] cpuid = 1
[365136] time = 1731354815
[365136] KDB: stack backtrace:
[365136] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe04314f7790
[365136] vpanic() at vpanic+0x136/frame 0xfffffe04314f78c0
[365136] panic() at panic+0x43/frame 0xfffffe04314f7920
[365136] tcp_do_segment() at tcp_do_segment+0x2852/frame 0xfffffe04314f79f0
[365136] tcp_input_with_port() at tcp_input_with_port+0x10e2/frame 0xfffffe04314f7b40
[365136] tcp6_input_with_port() at tcp6_input_with_port+0x6a/frame 0xfffffe04314f7b70
[365136] tcp6_input() at tcp6_input+0xb/frame 0xfffffe04314f7b80
[365136] ip6_input() at ip6_input+0xc76/frame 0xfffffe04314f7c60
[365136] netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe04314f7cc0
[365136] ether_demux() at ether_demux+0x16a/frame 0xfffffe04314f7cf0
[365136] ether_nh_input() at ether_nh_input+0x3cf/frame 0xfffffe04314f7d40
[365136] netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe04314f7da0
[365136] ether_input() at ether_input+0xd5/frame 0xfffffe04314f7e00
[365136] epair_tx_start_deferred() at epair_tx_start_deferred+0xd4/frame 0xfffffe04314f7e40
[365136] taskqueue_run_locked() at taskqueue_run_locked+0x1c7/frame 0xfffffe04314f7ec0
[365136] taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe04314f7ef0
[365136] fork_exit() at fork_exit+0x87/frame 0xfffffe04314f7f30
[365136] fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe04314f7f30
[365136] --- trap 0x3de64570, rip = 0, rsp = 0, rbp = 0 ---
[365136] Uptime: 4d5h25m36s
[365136] Dumping 50824 out of 73621 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
---snip--

Parts of the crashdump output (full output available on request):
---snip---
#5  0xffffffff806b6b72 in tcp_do_segment (tp=0xfffff80da7abfa80,
    m=<optimized out>, th=0xfffff804c5190a96, drop_hdrlen=72, tlen=0,
    iptos=<optimized out>)
    at /space/system/usr_src/sys/netinet/tcp_input.c:1548
        to = {to_flags = 128, to_tsval = 4294965249, to_tsecr = 4546,
          to_sacks = 0xfffff804c5190aae "\aa\217\345\aa\225}\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255", <incomplete sequence \336>,
          to_signature = 0x4 <error: Cannot access memory at address 0x4>,
          to_tfo_cookie = 0xfffffe04314f79c0 "\340yO1", to_mss = 22215,
          to_wscale = 111 'o', to_nsacks = 1 '\001', to_tfo_len = 255 '\377',
          to_spare = 2154633376}
        maxseg = 1432
        inp = 0xfffff80da7abfa80
        needoutput = 0
        incforsyn = <optimized out>
        so = 0xfffff8051a0ffc00
        inc = <optimized out>
        thflags = <optimized out>
        sack_changed = <optimized out>
        nsegs = 1
        s = <optimized out>
        tiwin = <optimized out>
        rstreason = <optimized out>
        todrop = <optimized out>
        acked = <optimized out>
        tfo_syn = <optimized out>
        mfree = <optimized out>
        ourfinisacked = <optimized out>
        win = <optimized out>
        close = <optimized out>
#6  0xffffffff806b3602 in tcp_input_with_port (mp=mp@entry=0xfffffe04314f7bc8,
    offp=offp@entry=0xfffffe04314f7bc0, proto=<optimized out>, port=0)
    at /space/system/usr_src/sys/netinet/tcp_input.c:1158
        so = 0xfffff8051a0ffc00
        to = {to_flags = 0, to_tsval = 0, to_tsecr = 719386432,
          to_sacks = 0xfffff80727e1a038 "\001",
          to_signature = 0xfffff80727e1a084 "",
          to_tfo_cookie = 0x5bbafa8e00000073 <error: Cannot access memory at address 0x5bbafa8e00000073>, to_mss = 37456, to_wscale = 5 '\005',
          to_nsacks = 0 '\000', to_tfo_len = 0 '\000', to_spare = 590855}
        m = 0xfffff804c5190a00
        th = 0xfffff804c5190a96
        ip = 0x0
        inp = <optimized out>
        tp = <unavailable>
        optp = 0xfffff804c5190aaa "\001\001\005\n\aa\217\345\aa\225}\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255", <incomplete sequence \336>
        optlen = 12
        tlen = <optimized out>
        rstreason = <optimized out>
        fwd_tag = 0x0
        ip6 = 0xfffff804c5190a6e
        s = 0x0
        off0 = <optimized out>
        iptos = 0 '\000'
        off = <optimized out>
        len = <optimized out>
        ipttl = <optimized out>
        thflags = <optimized out>
        drop_hdrlen = 72
        lookupflag = <optimized out>
        isipv6 = <optimized out>
#7  0xffffffff806b247a in tcp6_input_with_port (mp=0xfffffe04314f7bc8,
    offp=0xfffffe04314f7bc0, proto=<optimized out>, port=port@entry=0)
    at /space/system/usr_src/sys/netinet/tcp_input.c:594
        m = 0xfffff804c5190a00
        ip6 = <optimized out>
        ia6 = <unavailable>
#8  0xffffffff806b3d5b in tcp6_input (mp=<unavailable>, offp=<unavailable>,
    proto=<unavailable>) at /space/system/usr_src/sys/netinet/tcp_input.c:601
No locals.
---snip---

Comment 6 commit-hook freebsd_committer

2024-11-14 18:00:43 UTC

A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=8f5a2e216f4cb955150c8f88ab21eaecc5adc8b9

commit 8f5a2e216f4cb955150c8f88ab21eaecc5adc8b9
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2024-11-14 08:19:34 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2024-11-14 08:19:49 +0000

    tcp: fix cwnd recalculation during limited transmit

    Properly calculate the expected flight size (cwnd) during
    limited transmit. Exclude the SACK scoreboard from
    consideration when still in limited transmit.

    PR: 282605
    Reviewed By: tuexen, #transport
    Sponsored by: NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D47541

 sys/netinet/tcp_input.c  | 2 +-
 sys/netinet/tcp_output.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

Comment 7 Richard Scheffenegger freebsd_committer

2024-11-25 15:43:12 UTC

stable-13 is unaffected.

stable-14 may have some exposure:
- D43470 was not put in stable-14, which is the main trigger
- D43355 (which was backported) does some tweaking of cwnd in the limited transmit branch, where this KASSERT triggers, but I think this has not led to any panics in stable-14. All panics observed where after D43470 was committed to MAIN afaik.

Comment 8 Mark Johnston freebsd_committer

2024-11-25 17:02:39 UTC

(In reply to Richard Scheffenegger from comment #7)
Note that on stable/14, you won't see assertion failures unless running the GENERIC-DEBUG kernel config.  In particular, assertions are disabled by default on stable branches.

Comment 9 Michael Tuexen freebsd_committer

2024-11-25 17:35:32 UTC

(In reply to Richard Scheffenegger from comment #7)
Let me run the reproducer on stable/14 and report back.

Comment 10 Michael Tuexen freebsd_committer

2024-11-25 17:49:09 UTC

(In reply to Michael Tuexen from comment #9)
The reproducer does NOT trigger the panic when running on stable/14 with a GENERIC-DEBUG config.

Comment 11 Peter Jeremy freebsd_committer

2024-11-30 07:39:29 UTC

(In reply to Richard Scheffenegger from comment #7)
I have several panics from stable-14 but I haven't dug into them.

Comment 12 Richard Scheffenegger freebsd_committer

2024-11-30 11:19:14 UTC

(In reply to Peter Jeremy from comment #11)

Well, without the backtrace and possibly the core, it will be hard to understand why your environment would be excessively affected. The reproducer of this panic from earlier reports was fixed by D47541 (and masked by D47474).

Are you on stable-14 or main or some mixture?

Comment 13 Peter Jeremy freebsd_committer

2024-12-01 06:15:48 UTC

This is 14-STABLE from 2024-11-18 (df1b835333ff).  I have some local patches but nothing that should be impacting TCP.  It crashed twice in reasonably quick succession but I haven't seen it since then (and I have both cores).  Both times, it looked like below.  I had a look at applying D47541 but it didn't apply cleanly to 14-stable - I've just had a closer look and I think that was just "patch" getting confused by the context.

db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01d43a8770
vpanic() at vpanic+0x131/frame 0xfffffe01d43a88a0
panic() at panic+0x43/frame 0xfffffe01d43a8900
tcp_do_segment() at tcp_do_segment+0x29fd/frame 0xfffffe01d43a89e0
tcp_input_with_port() at tcp_input_with_port+0xf94/frame 0xfffffe01d43a8b20
tcp_input() at tcp_input+0xb/frame 0xfffffe01d43a8b30
ip_input() at ip_input+0x287/frame 0xfffffe01d43a8b80
netisr_dispatch_src() at netisr_dispatch_src+0xad/frame 0xfffffe01d43a8be0
ether_demux() at ether_demux+0x17a/frame 0xfffffe01d43a8c10
ether_nh_input() at ether_nh_input+0x3da/frame 0xfffffe01d43a8c60
netisr_dispatch_src() at netisr_dispatch_src+0xad/frame 0xfffffe01d43a8cc0
ether_input() at ether_input+0xe5/frame 0xfffffe01d43a8d20
_task_fn_rx() at _task_fn_rx+0xea4/frame 0xfffffe01d43a8e40
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x18e/frame 0xfffffe01d43a8ec0
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xd3/frame 0xfffffe01d43a8ef0
fork_exit() at fork_exit+0x82/frame 0xfffffe01d43a8f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe01d43a8f30

I've made the full core.txt available at https://home.rulingia.com/~peter/private/core.txt.51

Comment 14 Gleb Smirnoff freebsd_committer

2024-12-07 20:03:04 UTC

I got the panic again on the stabweek revision ff4c19bb5427, which
includes 8f5a2e216f4c aka D47541.  Re-opening the bug.

Richard, I got a core and can send you any debugging info.  Please
email me in private what do I need to get from the core for you.

Comment 15 Michael Tuexen freebsd_committer

2024-12-07 22:28:10 UTC

(In reply to Gleb Smirnoff from comment #14)
Could you enable BlackBox logging using

sysctl net.inet.tcp.bb.log_auto_all=1
sysctl net.inet.tcp.bb.log_auto_ratio=1
sysctl net.inet.tcp.bb.log_auto_mode=1

Also have the tcplog_dumper running. You can use

tcplog_dumper_enable="YES"

in /etc/rc.conf

Using these settings (mode = TCP_LOG_STATE_TAIL), keeps the trailing events of all TCP connections, but does not dump anything to disk. If the kernel panics, we can get the BBLog information from the core. I might be able to construct a packetdrill reproducer from the BBLog file. At least I was able to do it last time.

Comment 16 Peter Jeremy freebsd_committer

2024-12-30 11:05:16 UTC

I'm now seeing this every couple of days on 14-stable (src-c291194-g30bcbf8a2fef) with basically no TCP load.  I've had a look at the suggestions in #15 but can't find tcplog_dumper.  I have core dumps if that's of any interest.

Comment 17 Michael Tuexen freebsd_committer

2024-12-30 11:12:41 UTC

(In reply to Peter Jeremy from comment #16)
You should be able to install the tcplog_dumper by using
pkg install tcplog_dumper
and then follow the instructions in #15.

Once you have then a core file with BBLog information, we can extract it.

Comment 18 Shawn Webb 2025-01-21 00:38:55 UTC

I suspect the problematic commit might be the one pasted below. HardenedBSD has multiple users that have reported frequent crashes, except when the below-referenced commit is reverted.

The affected HardenedBSD users are not able to produce a crash dump. Please let me know if/how I can be of any assistance. I can have the affected users test candidate patches. Note that it might take several days to get testing results.

commit 6b2977c5978b58718d9cddc0a8c31e1bd9aa0361
Author: Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: Thu Oct 10 12:50:22 2024 +0200
Commit: Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: Fri Oct 18 09:51:38 2024 +0200

tcp: fix duplicate retransmissions when RTO happens during SACK loss recovery

When snd_nxt doesn't track snd_max, partial SACK ACKs may elicit
unexpected duplicate retransmissions. This is usually masked by
LRO not necessarily ACKing every individual segment, and prior
to RFC6675 SACK loss recovery, harder to trigger even when an
RTO happens while SACK loss recovery is ongoing.

Address this by improving the logic when to start a SACK loss recovery
and how to deal with a RTO, as well as improvements to the adjusted
congestion window during transmission selection.

Reviewed By: tuexen, cc, #transport
Sponsored by: NetApp, Inc.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D43355

(cherry picked from commit 440f4ba18e3ab7be912858bbcb96a419fcf14809)

Comment 19 Michael Tuexen freebsd_committer

2025-01-21 15:28:55 UTC

Created attachment 256878 [details]
packetdrill reproducer for stable/14

Comment 20 Michael Tuexen freebsd_committer

2025-01-21 15:29:21 UTC

(In reply to Shawn Webb from comment #18)
Is this for stable/14 or for CURRENT?

Comment 21 Shawn Webb 2025-01-21 16:56:42 UTC

(In reply to Michael Tuexen from comment #20)
This is in 14-stable.

Comment 22 Michael Tuexen freebsd_committer

2025-01-21 17:08:03 UTC

(In reply to Shawn Webb from comment #21)
Thanks for the confirmation. I am looking into it.

Comment 23 Gleb Smirnoff freebsd_committer

2025-02-05 16:49:03 UTC

*** Bug 284588 has been marked as a duplicate of this bug. ***

Comment 24 Wei Hu 2025-02-25 03:36:54 UTC

I just hit this on the latest main branch 2025/2/25:

panic: tcp_do_segment: sent too much
cpuid = 3
time = 1740452298
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00dc30b6e0
vpanic() at vpanic+0x136/frame 0xfffffe00dc30b810
panic() at panic+0x43/frame 0xfffffe00dc30b870
tcp_do_segment() at tcp_do_segment+0x2998/frame 0xfffffe00dc30b950
tcp_input_with_port() at tcp_input_with_port+0x1140/frame 0xfffffe00dc30baa0
tcp_input() at tcp_input+0xb/frame 0xfffffe00dc30bab0
ip_input() at ip_input+0x28d/frame 0xfffffe00dc30bb10
netisr_dispatch_src() at netisr_dispatch_src+0xb4/frame 0xfffffe00dc30bb70
ether_demux() at ether_demux+0x16a/frame 0xfffffe00dc30bba0
ether_nh_input() at ether_nh_input+0x3cd/frame 0xfffffe00dc30bbf0
netisr_dispatch_src() at netisr_dispatch_src+0xb4/frame 0xfffffe00dc30bc50
ether_input() at ether_input+0xd5/frame 0xfffffe00dc30bcb0
hn_xpnt_vf_input() at hn_xpnt_vf_input+0x126/frame 0xfffffe00dc30bd20
tcp_lro_flush() at tcp_lro_flush+0x34f/frame 0xfffffe00dc30bd50
tcp_lro_flush_all() at tcp_lro_flush_all+0x1f3/frame 0xfffffe00dc30bda0
mana_poll() at mana_poll+0x9db/frame 0xfffffe00dc30be40
taskqueue_run_locked() at taskqueue_run_locked+0x1c2/frame 0xfffffe00dc30bec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe00dc30bef0
fork_exit() at fork_exit+0x82/frame 0xfffffe00dc30bf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00dc30bf30
--- trap 0x17ee7fe0, rip = 0x3fe908dbc388, rsp = 0x17ee7cb800000001, rbp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100194 ]
Stopped at      kdb_enter+0x33: movq    $0,0x1046992(%rip)

Comment 25 Colin Percival freebsd_committer

2025-02-25 20:33:52 UTC

I just tripped over this in EC2 with a kernel built from 12 days ago (FreeBSD 15.0-CURRENT #0 main-n275409-6156da866e7d: Thu Feb 13 07:15:24 UTC 2025):

tcp_do_segment: sent too much
cpuid = 0
time = 1740512855
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
vpanic() at vpanic+0x1a0
panic() at panic+0x48
tcp_do_segment() at tcp_do_segment+0x2778
tcp_input_with_port() at tcp_input_with_port+0xfd4
tcp_input() at tcp_input+0x10
ip_input() at ip_input+0x354
netisr_dispatch_src() at netisr_dispatch_src+0xd8
ether_demux() at ether_demux+0x174
ether_nh_input() at ether_nh_input+0x374
netisr_dispatch_src() at netisr_dispatch_src+0xd8
ether_input() at ether_input+0xdc
ena_cleanup() at ena_cleanup+0x4fc
taskqueue_run_locked() at taskqueue_run_locked+0x17c
taskqueue_thread_loop() at taskqueue_thread_loop+0xc0
fork_exit() at fork_exit+0x78
fork_trampoline() at fork_trampoline+0x18
KDB: enter: panic
[ thread pid 0 tid 100211 ]
Stopped at      kdb_enter+0x48: str     xzr, [x19, #2048]

I can leave the system running for a while and have a serial console open if it would be useful to poke at anything before rebooting.  I'll kill the instance in a few days if I don't hear from anyone.

Comment 26 Michael Tuexen freebsd_committer

2025-02-25 20:49:19 UTC

Will put up a fix for review shortly.

Comment 27 Lexi Winter freebsd_committer

2025-02-28 18:38:50 UTC

ran into this today on recent 15.0, screenshot of stack trace: https://i.imgur.com/gmTDD7q.png

i'm happy to test a patch once there is one.

Comment 28 Dave Cottlehuber freebsd_committer

2025-03-15 14:17:33 UTC

I can repro this in ~ 5 minutes on my arm64 router, while doing some web loadgen
testing through it using wrk2 (not in ports yet).

$ wrk -d 120s -c 200 -t 8 -L -R 800 https://whiskey.skunkwerks.at/

- 15.0-CURRENT GENERIC arm64
- built from 20250315 at d2a55e6a9348 (upstream/main) openssl: update ASM and version info for 3.0.16 import

(kgdb) bt
#0  0xffff0000004b8ce8 in doadump (textdump=textdump@entry=1) at /usr/src/sys/kern/kern_shutdown.c:404
#1  0xffff0000004b8aa8 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:524
#2  0xffff0000004b9028 in vpanic (fmt=<optimized out>, ap=...) at /usr/src/sys/kern/kern_shutdown.c:979
#3  0xffff0000004b8dcc in panic (fmt=0xffff000000ce7000 <M_PLIMIT+64> "") at /usr/src/sys/kern/kern_shutdown.c:892
#4  0xffff000000666234 in tcp_do_segment (tp=0xffffa085c2f52000, m=<optimized out>, th=0xffff0001631630e2, drop_hdrlen=<optimized out>,
    tlen=0, iptos=<optimized out>) at /usr/src/sys/netinet/tcp_input.c:1545
#5  0xffff000000662d88 in tcp_input_with_port (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>, port=0, port@entry=42192)
    at /usr/src/sys/netinet/tcp_input.c:816
#6  0xffff000000663aa8 in tcp_input (mp=0xffff000000ce7000 <M_PLIMIT+64>, offp=0xffff000000db9000 <sdt_vfs_vop_vop_spare5_entry+96>,
    proto=1) at /usr/src/sys/netinet/tcp_input.c:1499
#7  0xffff00000064e2ac in ip_input (m=0x0) at /usr/src/sys/netinet/ip_input.c:861
#8  0xffff000000612344 in netisr_dispatch_src (proto=proto@entry=1, source=0, source@entry=18446462603272168832, m=0xffffa0804716a800)
    at /usr/src/sys/net/netisr.c:1151
#9  0xffff000000612728 in netisr_dispatch (proto=13529088, proto@entry=1, m=0x1) at /usr/src/sys/net/netisr.c:1242
#10 0xffff0000005f4214 in ether_demux (ifp=ifp@entry=0xffffa0800440d000, m=0xffffa0804716a800) at /usr/src/sys/net/if_ethersubr.c:942
#11 0xffff0000005f58c0 in ether_input_internal (ifp=0xffffa0800440d000, m=0xffffa0804716a800) at /usr/src/sys/net/if_ethersubr.c:706
#12 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:736
#13 0xffff000000612344 in netisr_dispatch_src (proto=proto@entry=5, source=0, source@entry=18446462603272169024, m=0xffffa0804716a800)
    at /usr/src/sys/net/netisr.c:1151
#14 0xffff000000612728 in netisr_dispatch (proto=13529088, proto@entry=5, m=0x1) at /usr/src/sys/net/netisr.c:1242
#15 0xffff0000005f4664 in ether_input (ifp=0xffffa0800440d000, m=0xffff000000db9000 <sdt_vfs_vop_vop_spare5_entry+96>)
    at /usr/src/sys/net/if_ethersubr.c:847
#16 0xffff0000008be654 in dpaa2_ni_rx (ch=ch@entry=0xffff000113aaf000, fq=0xffff000113aaf180, fd=<optimized out>,
    ctx=ctx@entry=0xffff00010e90a788) at /usr/src/sys/dev/dpaa2/dpaa2_ni.c:3179
#17 0xffff0000008bdc34 in dpaa2_ni_consume_frames (chan=0xffff000113aaf000, src=<optimized out>, consumed=<optimized out>)
    at /usr/src/sys/dev/dpaa2/dpaa2_ni.c:3051
#18 dpaa2_ni_rx_cleanup (ch=0xffff000113aaf000) at /usr/src/sys/dev/dpaa2/dpaa2_ni.c:2868
#19 dpaa2_ni_cleanup_task (arg=<optimized out>, count=<optimized out>) at /usr/src/sys/dev/dpaa2/dpaa2_ni.c:2827
#20 0xffff000000521de4 in taskqueue_run_locked (queue=queue@entry=0xffffa08004720480) at /usr/src/sys/kern/subr_taskqueue.c:517
#21 0xffff000000522d1c in taskqueue_thread_loop (arg=arg@entry=0xffff000113ab1e80) at /usr/src/sys/kern/subr_taskqueue.c:829
#22 0xffff000000468ef4 in fork_exit (callout=0xffff000000522c58 <taskqueue_thread_loop>, arg=0xffff000113ab1e80, frame=0xffff00010e90aa00)
    at /usr/src/sys/kern/kern_fork.c:1152
#23 <signal handler called>

Comment 29 Michael Tuexen freebsd_committer

2025-03-16 07:25:32 UTC

Created attachment 258715 [details]
patch for testing

Comment 30 Michael Tuexen freebsd_committer

2025-03-16 07:27:18 UTC

Can you please test the patch I added to the attachments and report the result?
It is not the patch I will commit, but it is the part which is intended to fix the panic.

Comment 31 Richard Scheffenegger freebsd_committer

2025-03-16 08:56:59 UTC

Alternatively, you can also apply https://reviews.freebsd.org/D48652 - which does not fix the issue of incorrectly unwinding an RTO, but once an RTO was declared spurious, fixes the subsequent cwnd calculation by resetting the state more completely.

Comment 32 Michael Tuexen freebsd_committer

2025-03-16 08:59:35 UTC

(In reply to Richard Scheffenegger from comment #31)
But to fix the root cause, I need feedback on the patch I provided.

Comment 33 Andrey V. Elsukov freebsd_committer

2025-03-17 08:43:25 UTC

(In reply to Alexander Leidinger from comment #5)
I also found similar panic with ipv6 on 15.0 based on 2024-10-13:

(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=textdump@entry=1) at /usr/src/sys/kern/kern_shutdown.c:404
#2  0xffffffff80b534c0 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:524
#3  0xffffffff80b539dc in vpanic (fmt=0xffffffff811fbf6d "%s: sent too much", ap=ap@entry=0xfffffe01060526b0) at /usr/src/sys/kern/kern_shutdown.c:979
#4  0xffffffff80b53823 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:892
#5  0xffffffff80d4b102 in tcp_do_segment (tp=0xfffff80399e5fa80, tp@entry=<error reading variable: value is not available>, m=<optimized out>,
    m@entry=<error reading variable: value is not available>, th=0xfffff8004426169a, th@entry=<error reading variable: value is not available>, drop_hdrlen=72,
    drop_hdrlen@entry=<error reading variable: value is not available>, tlen=0, tlen@entry=<error reading variable: value is not available>, iptos=<unavailable>,
    iptos@entry=<error reading variable: value is not available>) at /usr/src/sys/netinet/tcp_input.c:1548
#6  0xffffffff80d47aa5 in tcp_input_with_port (mp=mp@entry=0xfffffe0106052988, offp=offp@entry=0xfffffe0106052980, proto=<optimized out>, port=0)
    at /usr/src/sys/netinet/tcp_input.c:1158
#7  0xffffffff80d4691a in tcp6_input_with_port (mp=0xfffffe0106052988, offp=0xfffffe0106052980, proto=<optimized out>, port=port@entry=0) at /usr/src/sys/netinet/tcp_input.c:594
#8  0xffffffff80d481eb in tcp6_input (mp=<unavailable>, offp=<unavailable>, proto=<unavailable>) at /usr/src/sys/netinet/tcp_input.c:601
#9  0xffffffff80d929c0 in ip6_input (m=0x0, m@entry=<error reading variable: value is not available>) at /usr/src/sys/netinet6/ip6_input.c:930
#10 0xffffffff80caf29d in netisr_dispatch_src (proto=proto@entry=6, source=source@entry=0, m=0xfffff80044261600) at /usr/src/sys/net/netisr.c:1152
#11 0xffffffff80caf61f in netisr_dispatch (proto=<unavailable>, proto@entry=6, m=<unavailable>) at /usr/src/sys/net/netisr.c:1243
#12 0xffffffff80c9180a in ether_demux (ifp=ifp@entry=0xfffff80014bd1000, m=0xfffff80044261600) at /usr/src/sys/net/if_ethersubr.c:946
#13 0xffffffff80c92efd in ether_input_internal (ifp=0xfffff80014bd1000, m=0xfffff80044261600) at /usr/src/sys/net/if_ethersubr.c:710
#14 ether_nh_input (m=<unavailable>, m@entry=<error reading variable: value is not available>) at /usr/src/sys/net/if_ethersubr.c:740
#15 0xffffffff80caf29d in netisr_dispatch_src (proto=proto@entry=5, source=source@entry=0, m=0xfffff80044261600) at /usr/src/sys/net/netisr.c:1152
#16 0xffffffff80caf61f in netisr_dispatch (proto=<unavailable>, proto@entry=5, m=<unavailable>) at /usr/src/sys/net/netisr.c:1243
#17 0xffffffff80c91c65 in ether_input (ifp=0xfffff80014bd1000, ifp@entry=<error reading variable: value is not available>, m=<unavailable>,
    m@entry=<error reading variable: value is not available>) at /usr/src/sys/net/if_ethersubr.c:851
#18 0xffffffff80c9175b in ether_demux (ifp=ifp@entry=0xfffff80004619000, m=0xfffff80044261600) at /usr/src/sys/net/if_ethersubr.c:897
#19 0xffffffff80c92efd in ether_input_internal (ifp=0xfffff80004619000, m=0xfffff80044261600) at /usr/src/sys/net/if_ethersubr.c:710
#20 ether_nh_input (m=<unavailable>, m@entry=<error reading variable: value is not available>) at /usr/src/sys/net/if_ethersubr.c:740
#21 0xffffffff80caf29d in netisr_dispatch_src (proto=proto@entry=5, source=source@entry=0, m=0xfffff80044261600) at /usr/src/sys/net/netisr.c:1152
#22 0xffffffff80caf61f in netisr_dispatch (proto=<unavailable>, proto@entry=5, m=<unavailable>) at /usr/src/sys/net/netisr.c:1243
#23 0xffffffff80c91c65 in ether_input (ifp=0xfffff80004619000, m=<unavailable>) at /usr/src/sys/net/if_ethersubr.c:851
#24 0xffffffff80cab203 in iflib_rxeof (rxq=rxq@entry=0xfffff80004e73680, budget=<optimized out>) at /usr/src/sys/net/iflib.c:3081
#25 0xffffffff80ca53aa in _task_fn_rx (context=0xfffff80004e73680) at /usr/src/sys/net/iflib.c:4155
#26 0xffffffff80ba0eee in gtaskqueue_run_locked (queue=queue@entry=0xfffff800042d0600) at /usr/src/sys/kern/subr_gtaskqueue.c:368
#27 0xffffffff80ba0c23 in gtaskqueue_thread_loop (arg=arg@entry=0xfffffe003c109038) at /usr/src/sys/kern/subr_gtaskqueue.c:544
#28 0xffffffff80b080e2 in fork_exit (callout=0xffffffff80ba0b50 <gtaskqueue_thread_loop>, arg=0xfffffe003c109038, frame=0xfffffe0106052f40) at /usr/src/sys/kern/kern_fork.c:1151
#29 <signal handler called>
(kgdb) f 5
#5  0xffffffff80d4b102 in tcp_do_segment (tp=0xfffff80399e5fa80, tp@entry=<error reading variable: value is not available>, m=<optimized out>,
    m@entry=<error reading variable: value is not available>, th=0xfffff8004426169a, th@entry=<error reading variable: value is not available>, drop_hdrlen=72,
    drop_hdrlen@entry=<error reading variable: value is not available>, tlen=0, tlen@entry=<error reading variable: value is not available>, iptos=<unavailable>,
    iptos@entry=<error reading variable: value is not available>) at /usr/src/sys/netinet/tcp_input.c:1548
1548		KASSERT(tp->t_state != TCPS_TIME_WAIT, ("%s: TCPS_TIME_WAIT",
(kgdb) i lo
to = {to_flags = 16, to_tsval = 2267638854, to_tsecr = 249601034, to_sacks = 0xfffff80399e5fa80 "", to_signature = 0x4 <error: Cannot access memory at address 0x4>,
  to_tfo_cookie = 0xfffffe0106052780 "\004", to_mss = 55127, to_wscale = 216 '\330', to_nsacks = 128 '\200', to_tfo_len = 255 '\377', to_spare = 0}
maxseg = 1328
inp = 0xfffff80399e5fa80
needoutput = 0
incforsyn = <optimized out>
so = 0xfffff80399781800
inc = <optimized out>
thflags = <optimized out>
sack_changed = <optimized out>
nsegs = 1
s = <optimized out>
rstreason = <optimized out>
tiwin = <optimized out>
todrop = <optimized out>
acked = <optimized out>
tfo_syn = <optimized out>
mfree = <optimized out>
ourfinisacked = <optimized out>
win = <optimized out>
close = <optimized out>

Comment 34 Dave Cottlehuber freebsd_committer

2025-03-17 09:16:53 UTC

hi Michael after that patch, I ran wrk 3x and no panics.
I'll leave it in a loop for a while.

Comment 35 Michael Tuexen freebsd_committer

2025-03-17 10:58:12 UTC

(In reply to Andrey V. Elsukov from comment #33)
I don't expect the problem to be specific to IPv4 or IPv6... Just a matter of which network layer you use.

Comment 36 Michael Tuexen freebsd_committer

2025-03-17 10:59:05 UTC

(In reply to Dave Cottlehuber from comment #34)
Perfect, thank you.

I do know that the code I removed is a problem, but I want to make sure that there are not more problems...

Comment 37 Gleb Smirnoff freebsd_committer

2025-03-18 19:12:54 UTC

Michael, Richard, do we have any chances for this panic to be fixed
before March stabweek (next week)?  It is already over 4 months since
the panic was discovered and it affects many people.

Comment 38 Shawn Webb 2025-03-18 19:20:36 UTC

(In reply to Michael Tuexen from comment #30)
I'll coordinate testing a new build with the HardenedBSD community members being impacted by this bug. I'll try to get them a new test build this weekend.

Comment 39 Michael Tuexen freebsd_committer

2025-03-18 19:29:58 UTC

(In reply to Gleb Smirnoff from comment #37)
I can provide a patch tomorrow. Need to look at RACK and BBR...

Comment 40 Don Lewis freebsd_committer

2025-03-19 03:53:58 UTC

(In reply to Michael Tuexen from comment #29)

Before this patch, I was getting the panic about once every few weeks.  I've been running the patch for the last day.  No problems so far, so I know that it didn't make things substantially worse.

Comment 41 Michael Tuexen freebsd_committer

2025-03-19 22:06:00 UTC

A patch is under review D49414.

Comment 42 commit-hook freebsd_committer

2025-03-20 19:36:14 UTC

A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=fbcf3b74e8f2c0c5ba37f1839bfe9395eb2fd0b1

commit fbcf3b74e8f2c0c5ba37f1839bfe9395eb2fd0b1
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2025-03-20 15:17:40 +0000
Commit:     Michael Tuexen <tuexen@FreeBSD.org>
CommitDate: 2025-03-20 15:17:40 +0000

    tcp: fix detection of bad RTOs

    If timestamps are enabled, the actions performed by a retransmission
    timeout were rolled back, when they should not.
    It is needed to make sure the incoming segment advances SND.UNA.
    To do this, remove the incorrect upfront check and extend the check in
    the fast path to handle also the case of timestamps.

    PR:                     282605
    Reviewed by:            cc, rscheff, Peter Lei
    MFC after:              3 days
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D49414

 sys/netinet/tcp_input.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

Comment 43 commit-hook freebsd_committer

2025-03-21 05:25:36 UTC

A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=6f6c07813b38ab04d8b1b2bb87c0291dbae25a25

commit 6f6c07813b38ab04d8b1b2bb87c0291dbae25a25
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2025-03-21 04:23:30 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2025-03-21 04:28:31 +0000

    tcp: fix reverting of spurious timeouts (RTO)

    One variable that became critical to correctly calculate
    the cwnd during limited transmit was not properly reverted
    on detection of spurious timeouts.

    PR:                     282605
    Reviewed By:            cc, tuexen, #transport
    MFC after:              3 days
    Sponsored by:           NetApp, Inc.
    Differential Revision:  https://reviews.freebsd.org/D48652

 sys/netinet/tcp_input.c | 1 +
 1 file changed, 1 insertion(+)

Comment 44 commit-hook freebsd_committer

2025-03-24 05:57:50 UTC

A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=4a328b8049962976d13b8fb9abe2d9a488f2a894

commit 4a328b8049962976d13b8fb9abe2d9a488f2a894
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2025-03-21 04:23:30 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2025-03-24 05:35:43 +0000

    tcp: fix reverting of spurious timeouts (RTO)

    One variable that became critical to correctly calculate
    the cwnd during limited transmit was not properly reverted
    on detection of spurious timeouts.

    PR:                     282605
    Reviewed By:            cc, tuexen, #transport
    MFC after:              3 days
    Sponsored by:           NetApp, Inc.
    Differential Revision:  https://reviews.freebsd.org/D48652

    (cherry picked from commit 6f6c07813b38ab04d8b1b2bb87c0291dbae25a25)

 sys/netinet/tcp_input.c | 1 +
 1 file changed, 1 insertion(+)

Comment 45 Maxim Filimonov 2025-03-25 18:04:27 UTC

It's starting to look like the issue has been fixed in the recent commits. At the very least, I haven't had a kernel panic in two days.

Comment 46 Kyle Evans freebsd_committer

2025-03-26 13:50:43 UTC

I just hit a case this morning on a kernel at 67c1c4dfd1, which has both of these included:

fbcf3b74e8f2c0c5 ("tcp: fix detection of bad RTOs")
6f6c07813b38ab04 ("tcp: fix reverting of spurious timeouts (RTO)")

panic: tcp_do_segment: sent too much                                            
cpuid = 1                                                                       
time = 1742996303                                                               
KDB: stack backtrace:                                                           
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe015b6b55b0  
vpanic() at vpanic+0x136/frame 0xfffffe015b6b56e0                               
panic() at panic+0x43/frame 0xfffffe015b6b5740                                  
tcp_do_segment() at tcp_do_segment+0x28ed/frame 0xfffffe015b6b5820              
tcp_input_with_port() at tcp_input_with_port+0x1140/frame 0xfffffe015b6b5970    
tcp6_input_with_port() at tcp6_input_with_port+0x6a/frame 0xfffffe015b6b59a0    
tcp6_input() at tcp6_input+0xb/frame 0xfffffe015b6b59b0                         
ip6_input() at ip6_input+0xc98/frame 0xfffffe015b6b5a90                         
netisr_dispatch_src() at netisr_dispatch_src+0xb4/frame 0xfffffe015b6b5af0      
ether_demux() at ether_demux+0x16a/frame 0xfffffe015b6b5b20                     
ether_nh_input() at ether_nh_input+0x3cd/frame 0xfffffe015b6b5b70               
netisr_dispatch_src() at netisr_dispatch_src+0xb4/frame 0xfffffe015b6b5bd0      
ether_input() at ether_input+0xd5/frame 0xfffffe015b6b5c30                      
ieee80211_deliver_data() at ieee80211_deliver_data+0xeb/frame 0xfffffe015b6b5c80
sta_input() at sta_input+0xcda/frame 0xfffffe015b6b5d40                         
ieee80211_input_mimo() at ieee80211_input_mimo+0x20c/frame 0xfffffe015b6b5df0   
lkpi_80211_lhw_rxq_task() at lkpi_80211_lhw_rxq_task+0x143/frame 0xfffffe015b6b5e40
taskqueue_run_locked() at taskqueue_run_locked+0x1c2/frame 0xfffffe015b6b5ec0   
taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe015b6b5ef0  
fork_exit() at fork_exit+0x82/frame 0xfffffe015b6b5f30                          
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe015b6b5f30               
--- trap 0xd229bf4a, rip = 0x79b36f1efc3c4ce3, rsp = 0xd7e96fa6591692e4, rbp = 0xa979fdd81e679dee ---

Comment 47 Michael Tuexen freebsd_committer

2025-03-26 14:22:39 UTC

(In reply to Kyle Evans from comment #46)
Was BBLoging enabled?

Comment 48 Kyle Evans freebsd_committer

2025-03-26 14:34:06 UTC

(In reply to Michael Tuexen from comment #47)

Yes, allegedly- not sure how to confirm/extract anything useful from it, though.

Comment 49 Michael Tuexen freebsd_committer

2025-03-26 15:07:57 UTC

(In reply to Kyle Evans from comment #48)
Please do the following:

1. Download tcplog.py from review D48705.
2. Start kgdb with the relevant core.
3. source tcplog.py
4. go up the stack until you have the tp as a variable
5. tcplog_dump tp
6. quit kgdb.Now you should have a *.bin file. Please send this to tuexen@freebsd.org.

If this does not work, please report.

Comment 50 Andriy Gapon freebsd_committer

2025-03-27 05:33:55 UTC

Just got this crash as well, just on the older stable/14, so probably no new useful info from  me.
I have the vmcore, just in case.

panic: tcp_do_segment: sent too much
cpuid = 3
time = 1743052191
KDB: stack backtrace:
db_trace_self_wrapper() at 0xffffffff8061eadb = db_trace_self_wrapper+0x2b/frame 0xfffffe0120fda760
kdb_backtrace() at 0xffffffff80956a57 = kdb_backtrace+0x37/frame 0xfffffe0120fda810
vpanic() at 0xffffffff80907629 = vpanic+0x169/frame 0xfffffe0120fda950
panic() at 0xffffffff80907403 = panic+0x43/frame 0xfffffe0120fda9b0
tcp_do_segment() at 0xffffffff80a82864 = tcp_do_segment+0x1c24/frame 0xfffffe0120fdaa90
tcp_input_with_port() at 0xffffffff80a8066d = tcp_input_with_port+0x12bd/frame 0xfffffe0120fdabc0
tcp_input() at 0xffffffff80a80c2b = tcp_input+0xb/frame 0xfffffe0120fdabd0
ip_input() at 0xffffffff80a735ae = ip_input+0x25e/frame 0xfffffe0120fdac10
netisr_dispatch_src() at 0xffffffff80a436ff = netisr_dispatch_src+0x8f/frame 0xfffffe0120fdac70
netisr_dispatch() at 0xffffffff80a43a6e = netisr_dispatch+0xe/frame 0xfffffe0120fdac80
ether_demux() at 0xffffffff80a39cc9 = ether_demux+0x169/frame 0xfffffe0120fdacb0
ether_nh_input() at 0xffffffff80a3afe0 = ether_nh_input+0x2c0/frame 0xfffffe0120fdace0
netisr_dispatch_src() at 0xffffffff80a436ff = netisr_dispatch_src+0x8f/frame 0xfffffe0120fdad40
netisr_dispatch() at 0xffffffff80a43a6e = netisr_dispatch+0xe/frame 0xfffffe0120fdad50
ether_input() at 0xffffffff80a3a115 = ether_input+0xa5/frame 0xfffffe0120fdada0
if_input() at 0xffffffff80a2fa2a = if_input+0xa/frame 0xfffffe0120fdadb0
re_rxeof() at 0xffffffff806ff873 = re_rxeof+0x463/frame 0xfffffe0120fdae20
re_intr_msi() at 0xffffffff806fd747 = re_intr_msi+0xc7/frame 0xfffffe0120fdae60
ithread_loop() at 0xffffffff808c67ca = ithread_loop+0x2aa/frame 0xfffffe0120fdaef0
fork_exit() at 0xffffffff808c2b47 = fork_exit+0xc7/frame 0xfffffe0120fdaf30
fork_trampoline() at 0xffffffff80c5034e = fork_trampoline+0xe/frame 0xfffffe0120fdaf30

Comment 51 Gleb Smirnoff freebsd_committer

2025-03-30 16:56:22 UTC

syzkaller just hit the bug on most recent CURRENT:

https://syzkaller.appspot.com/bug?extid=e4418498cc95b1cf5c16

Comment 52 Michael Tuexen freebsd_committer

2025-03-30 17:16:58 UTC

(In reply to Gleb Smirnoff from comment #51)
Yepp. However, syzkaller does not help here. However, I do have a BBLog file from Kyle which I will use to try to get a reproducer using packetdrill. There is a least one more codepath triggering the assertion...

Comment 53 Mike 2025-04-04 08:26:34 UTC

I run HARDENEDBSD-14-STABLE and have been seeing this.

I grabbed source for stable-14 and compiled a kernel on 26 March, validating that tcp_input.c was post  4a328b8049962976d13b8fb9abe2d9a488f2a894.

The crashes seem decidedly fewer, but i got this one yesterday:

[1692] panic: tcp_do_segment: sent too much
[1692] cpuid = 2
[1692] time = 1742987244
[1692] __HardenedBSD_version = 1400007 __FreeBSD_version = 1402501
[1692] version = FreeBSD 14.2-STABLE-HBSD  HARDENEDBSD-14-STABLE
[1692] KDB: stack backtrace:
[1692] #0 0xffffffff80bb8aeb at kdb_backtrace+0x7b
[1692] #1 0xffffffff80b69af6 at vpanic+0x166
[1692] #2 0xffffffff80b698e3 at panic+0x43
[1692] #3 0xffffffff80d56ec1 at tcp_do_segment+0x2b91
[1692] #4 0xffffffff80d537f0 at tcp_input_with_port+0x11f0
[1692] #5 0xffffffff80d5431b at tcp_input+0xb
[1692] #6 0xffffffff80d414f3 at ip_input+0x2c3
[1692] #7 0xffffffff80cbd4c4 at netisr_dispatch_src+0xc4
[1692] #8 0xffffffff80c9f47a at ether_demux+0x17a
[1692] #9 0xffffffff80ca0c27 at ether_nh_input+0x417
[1692] #10 0xffffffff80cbd4c4 at netisr_dispatch_src+0xc4
[1692] #11 0xffffffff80c9f8e9 at ether_input+0xe9
[1692] #12 0xffffffff80d5db4f at tcp_lro_flush+0x35f
[1692] #13 0xffffffff80d5e027 at tcp_lro_rx_done+0x37
[1692] #14 0xffffffff80d5dd48 at tcp_lro_flush_all+0x178
[1692] #15 0xffffffff80cb92ad at iflib_rxeof+0xead
[1692] #16 0xffffffff80cb2ff5 at _task_fn_rx+0x85
[1692] #17 0xffffffff80bb73c3 at gtaskqueue_run_locked+0x1a3

I have applied the BlackBox Logging sysctls and am running tcplog_dumper, and will report back if (when) another crash occurs.

Comment 54 commit-hook freebsd_committer

2025-04-04 09:06:10 UTC

A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e754d89bba8c526fd1ff792616ff4cbf9c6e093e

commit e754d89bba8c526fd1ff792616ff4cbf9c6e093e
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2025-03-20 15:17:40 +0000
Commit:     Michael Tuexen <tuexen@FreeBSD.org>
CommitDate: 2025-04-04 09:05:10 +0000

    tcp: fix detection of bad RTOs

    If timestamps are enabled, the actions performed by a retransmission
    timeout were rolled back, when they should not.
    It is needed to make sure the incoming segment advances SND.UNA.
    To do this, remove the incorrect upfront check and extend the check in
    the fast path to handle also the case of timestamps.

    PR:                     282605
    Reviewed by:            cc, rscheff, Peter Lei
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D49414

    (cherry picked from commit fbcf3b74e8f2c0c5ba37f1839bfe9395eb2fd0b1)

 sys/netinet/tcp_input.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

Comment 55 Michael Tuexen freebsd_committer

2025-04-04 09:08:51 UTC

(In reply to Mike from comment #53)
Please update again to include the fix I just MFCed to stable/14. Please continue testing, because we are aware of one other path triggering the KASSERT. However, we have only reports for CURRENT, not stable/14. I would be great to know, if stable/14 is also affected.

Comment 56 Mike 2025-04-05 11:11:13 UTC

(In reply to Michael Tuexen from comment #55)

I realize I have been sloppy with my data and previous post, and need to correct/clarify the info I provided before, and call attention to the fact that the dump trace i provided is from *before* the 24 March patch, which I mistakenly associated with a crash on 3 April on the patched kernel.


Data Summary (too much information)
============
- The system had 13 unexpected reboots between an intentional reboot 6 Jan and 5 April at intervals varying from 13 minutes to 17 days
- Crashdumps were captured 3 times, all with a "savecore[xxxxx] reboot after panic: tcp_do_segment: sent too much" message
- The last crash on an unpatched kernel was applied was 26 March which is the dump trace I provided 
- A kernel with the patch from 24 March was installed 26 March after that crash
- The system crashed 3 April on the patched kernel, and did not leave a dump behind, so there is no direct evidence that it is caused by this issue, but the affected system has never crashed for *any* reason on earlier kernels.


Current status
==============
- The system is running the HBSD 14-STABLE in which your commit is the last applied: 14.2-STABLE-HBSD FreeBSD 14.2-STABLE-HBSD  hardened/14-stable/master-n193911-ce071b70191e HARDENEDBSD-14-STABLE amd64
- I have applied the BlackBox Logging sysctls and am running tcplog_dumper, and will report back if (when) another crash occurs.

Comment 57 Don Lewis freebsd_committer

2025-04-18 07:37:23 UTC

A couple days ago I got this panic with a -CURRENT kernel built after the recent bugfix commits.  The machine was busy building packages and my internet connection was flakey.

Mon Apr 14 18:19:44 PDT 2025

FreeBSD zipper.catspoiler.org 15.0-CURRENT FreeBSD 15.0-CURRENT #6 n276073-67c1c
4dfd1cc: Mon Mar 24 19:27:24 PDT 2025     dl@zipper.catspoiler.org:/usr/obj/usr/
src/amd64.amd64/sys/GENERIC  amd64

panic: tcp_do_segment: sent too much

Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...
Reading symbols from /boot/kernel/linux.ko...
Reading symbols from /usr/lib/debug//boot/kernel/linux.ko.debug...
Reading symbols from /boot/kernel/linux_common.ko...
Reading symbols from /usr/lib/debug//boot/kernel/linux_common.ko.debug...
Reading symbols from /boot/kernel/mqueuefs.ko...
Reading symbols from /usr/lib/debug//boot/kernel/mqueuefs.ko.debug...
Reading symbols from /boot/kernel/linux64.ko...
Reading symbols from /usr/lib/debug//boot/kernel/linux64.ko.debug...
Reading symbols from /boot/kernel/geom_mirror.ko...
Reading symbols from /usr/lib/debug//boot/kernel/geom_mirror.ko.debug...
Reading symbols from /boot/kernel/zfs.ko...
Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug...
Reading symbols from /boot/kernel/fdescfs.ko...
Reading symbols from /usr/lib/debug//boot/kernel/fdescfs.ko.debug...
Reading symbols from /boot/kernel/acpi_wmi.ko...
Reading symbols from /usr/lib/debug//boot/kernel/acpi_wmi.ko.debug...
Reading symbols from /boot/kernel/intpm.ko...
Reading symbols from /usr/lib/debug//boot/kernel/intpm.ko.debug...
Reading symbols from /boot/kernel/smbus.ko...
Reading symbols from /usr/lib/debug//boot/kernel/smbus.ko.debug...
Reading symbols from /boot/kernel/uhid.ko...
Reading symbols from /usr/lib/debug//boot/kernel/uhid.ko.debug...
Reading symbols from /boot/kernel/ums.ko...
Reading symbols from /usr/lib/debug//boot/kernel/ums.ko.debug...
Reading symbols from /boot/kernel/nullfs.ko...
Reading symbols from /usr/lib/debug//boot/kernel/nullfs.ko.debug...
Reading symbols from /boot/kernel/mac_ntpd.ko...
Reading symbols from /usr/lib/debug//boot/kernel/mac_ntpd.ko.debug...
Reading symbols from /boot/kernel/linprocfs.ko...
Reading symbols from /usr/lib/debug//boot/kernel/linprocfs.ko.debug...
__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
57              __asm("movq %%gs:%c1,%0" : "=r" (td)
(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
        td = <optimized out>
#1  doadump (textdump=textdump@entry=0)
    at /usr/src/sys/kern/kern_shutdown.c:404
        error = 0
        coredump = <optimized out>
#2  0xffffffff804a34ba in db_dump (dummy=<optimized out>,
    dummy2=<optimized out>, dummy3=<optimized out>, dummy4=<optimized out>)
    at /usr/src/sys/ddb/db_command.c:596
        error = <optimized out>
#3  0xffffffff804a32ad in db_command (last_cmdp=<optimized out>,
    cmd_table=<optimized out>, dopager=true)
    at /usr/src/sys/ddb/db_command.c:508
        modif = "\000\000\000\000\000\000\000\000\024P\235\201\377\377\377\377x\
006D\003\000\370\377\377\000\000\000\000\000\000\000\000\b\000\000\0000\000\000\
000\320q\253Z\001\376\377\377pq\253Z\001\376\377\377\024P\235\201\377\377\377\37
7\020r\253Z\001\376\377\377dYJ\200\377\377\377\377@W\373\003\001\370\377\3770\36
3.\201\377\377\377\377\020P\235\201\377\377\377\377\000\000\000\000\000\000\000\
000@W\373\003\001\370\377\377"
        addr = -2135263629
        count = -1
        cmd = 0xffffffff8181a8b8 <db_cmds+616>
        have_addr = <optimized out>
        t = <optimized out>
        result = <optimized out>
#4  0xffffffff804a2f6d in db_command_loop ()
    at /usr/src/sys/ddb/db_command.c:555
No locals.
#5  0xffffffff804a6946 in db_trap (type=<optimized out>, code=<optimized out>)
    at /usr/src/sys/ddb/db_main.c:267
        jb = {{_jb = {0, -2193207102920, -2193207102784, -8791731251392,
              -2122206128, -2193207102912, -2193207102144, -2142607148,
              -2193207102816, -2137415265, -2118532256, 3}}}
        bkpt = false
        watchpt = false
        prev_jb = 0x0
        why = <optimized out>
#6  0xffffffff80ba80ff in kdb_trap (type=type@entry=3, code=code@entry=0,
    tf=tf@entry=0xfffffe015aab7540) at /usr/src/sys/kern/subr_kdb.c:790
        __pc = 0x0
        __pc = 0x0
        other_cpus = {__bits = {18446744071581860125, 18446741880502449296,
            13736095756378376981, 0, 17, 18446744071574325360,
            18446744071580898254, 18446741880502449344, 18446741880502449328,
            18446744071574320390, 4294967294, 44470858976, 0,
            18446744071580898253, 2167275808, 0}}
        be = 0xffffffff8181b450 <ddb_dbbe>
        intr = 70
        did_stop_cpus = <optimized out>
        handled = <optimized out>
#7  0xffffffff8109353b in trap (frame=<optimized out>)
    at /usr/src/sys/amd64/amd64/trap.c:608
        __pc = 0x0
        __pc = 0x0
        __pc = 0x0
        ksi = {ksi_link = {tqe_next = 0x11804a6adc,
            tqe_prev = 0xfffffe015aab7478}, ksi_info = {si_signo = 1521185992,
            si_errno = -511, si_code = 0, si_pid = 0, si_uid = 1521185976,
            si_status = -511, si_addr = 0xbea06a503faf0b15, si_value = {
              sival_int = 1004, sival_ptr = 0x3ec, sigval_int = 1004,
              sigval_ptr = 0x3ec}, _reason = {_fault = {_trapno = 1521188672},
              _timer = {_timerid = 1521188672, _overrun = -511}, _mesgq = {
                _mqd = 1521188672}, _poll = {_band = -2193207099584},
              _capsicum = {_syscall = 1521188672}, __spare__ = {
                __spare1__ = -2193207099584, __spare2__ = {0, 0, 0, 0, 1, 0,
                  1068436245}}}}, ksi_flags = 256, ksi_sigq = 0x0}
        signo = 1509952099
        ucode = -511
        td = <optimized out>
        p = <optimized out>
        dr6 = 0
        type = 3
        addr = <optimized out>
        pf = <optimized out>
        i = <optimized out>
#8  <signal handler called>
No locals.
#9  kdb_enter (why=<optimized out>, msg=<optimized out>)
    at /usr/src/sys/kern/subr_kdb.c:556
No locals.
#10 0xffffffff80b57f0b in vpanic (fmt=0xffffffff81210007 "%s: sent too much",
    ap=ap@entry=0xfffffe015aab7770) at /usr/src/sys/kern/kern_shutdown.c:967
        buf = "tcp_do_segment: sent too much", '\000' <repeats 226 times>
        __pc = 0x0
        __pc = 0x0
        __pc = 0x0
        other_cpus = {__bits = {65534, 0 <repeats 15 times>}}
        td = 0xfffff80103fb5740
        bootopt = <optimized out>
        newpanic = <optimized out>
#11 0xffffffff80b57d73 in panic (
    fmt=0xffffffff81b9c3a0 <cnputs_mtx> "G?\027\201\377\377\377\377")
    at /usr/src/sys/kern/kern_shutdown.c:892
        ap = {{gp_offset = 16, fp_offset = 48,
            overflow_arg_area = 0xfffffe015aab77a0,
            reg_save_area = 0xfffffe015aab7740}}
#12 0xffffffff80d5279d in tcp_do_segment (tp=0xfffff8045f238000,
    m=<optimized out>, th=0xfffff8008a06c396, drop_hdrlen=72, tlen=0,
    iptos=<optimized out>) at /usr/src/sys/netinet/tcp_input.c:1546
        to = {to_flags = 16, to_tsval = 1014279682, to_tsecr = 1048849,
          to_sacks = 0x106 <error: Cannot access memory at address 0x106>,
          to_signature = 0xffffffff81cded58 <mac_policy_rm> "7v#\201\377\377\377
\377",
          to_tfo_cookie = 0xffffffff811eab40 "/usr/src/sys/security/mac/mac_fram
ework.c", to_mss = 30720, to_wscale = 171 '\253', to_nsacks = 90 'Z',          to_tfo_len = 1 '\001', to_spare = 2159349026}
        maxseg = 1408
        inp = 0xfffff8045f238000
        needoutput = 0
        incforsyn = <optimized out>
        so = 0xfffff8007b541400
        inc = <optimized out>
        thflags = <optimized out>
        sack_changed = <optimized out>
        nsegs = 1
        s = <optimized out>
        tiwin = <optimized out>
        rstreason = <optimized out>
        todrop = <optimized out>
        acked = <optimized out>
        tfo_syn = <optimized out>
        mfree = <optimized out>
        ourfinisacked = <optimized out>
        win = <optimized out>
        close = <optimized out>
#13 0xffffffff80d4f1e0 in tcp_input_with_port (mp=mp@entry=0xfffffe015aab7a48,
    offp=offp@entry=0xfffffe015aab7a40, proto=<optimized out>, port=0)
    at /usr/src/sys/netinet/tcp_input.c:1165
        so = 0xfffff8007b541400
        to = {to_flags = 0, to_tsval = 4294967295, to_tsecr = 2195783824,
          to_sacks = 0xfffff8008aa76000 "\205\351\"\201\377\377\377\377",
          to_signature = 0x4133a5d681cded58 <error: Cannot access memory at addr
ess 0x4133a5d681cded58>,
          to_tfo_cookie = 0x1 <error: Cannot access memory at address 0x1>,
          to_mss = 63232, to_wscale = 88 'X', to_nsacks = 138 '\212',
          to_tfo_len = 0 '\000', to_spare = 2321086358}
        m = 0xfffff8008a06c300
        th = 0xfffff8008a06c396
        ip = 0x0
        inp = <optimized out>
        tp = 0xffffffff81b9c3a0 <cnputs_mtx>
        optp = 0xfffff8008a06c3aa "\001\001\b\n<t\256\002\320\0038{\255\336\336\
300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\
300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\
300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\
300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\
300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\
300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\
300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\
300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\
300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\
300\255\336\336\300"...
        optlen = 12
        tlen = <optimized out>
        rstreason = <optimized out>
        fwd_tag = 0x0
        ip6 = 0xfffff8008a06c36e
        s = 0x0
        off0 = <optimized out>
        iptos = 128 '\200'
        off = <optimized out>
        len = <optimized out>
        ipttl = <optimized out>
        thflags = <optimized out>
        drop_hdrlen = 72
        lookupflag = <optimized out>
        isipv6 = <optimized out>
#14 0xffffffff80d4dffa in tcp6_input_with_port (mp=0xfffffe015aab7a48,
    offp=0xfffffe015aab7a40, proto=<optimized out>, port=port@entry=0)
    at /usr/src/sys/netinet/tcp_input.c:600
        m = 0xfffff8008a06c300
        ip6 = <optimized out>
        ia6 = 0x12
#15 0xffffffff80d4f94b in tcp6_input (mp=0xffffffff81b9c3a0 <cnputs_mtx>,
    offp=0xfffffe015aab74c0, proto=-2128026873)
    at /usr/src/sys/netinet/tcp_input.c:607
No locals.
#16 0xffffffff80d9a5e8 in ip6_input (m=0x0)
    at /usr/src/sys/netinet6/ip6_input.c:941
        odst = {__u6_addr = {
            __u6_addr8 = "\000\367X\212\000\370\377\377\001\000\000\000\000\000\
000", __u6_addr16 = {63232, 35416, 63488, 65535, 1, 0, 0, 0}, __u6_addr32 = {
              2321086208, 4294965248, 1, 0}}}
        plen = 32
        rtalert = 4294967295
        off = 40
        ours = 1
        nxt = <optimized out>
        srcrt = <optimized out>
        rcvif = 0xfffff80117541800
        ip6 = <optimized out>
        ia = <optimized out>
        nest = 1
#17 0xffffffff80cb4b64 in netisr_dispatch_src (proto=proto@entry=6,
    source=source@entry=0, m=0xfffff8008a06c300)
    at /usr/src/sys/net/netisr.c:1151
        __pc = 0x0
        __pc = 0x0
        __pc = 0x0
        cpuid = 0
        dosignal = 0
        npp = 0xffffffff81cc8070 <netisr_proto+336>
        dispatch_policy = 2166940423
        nwsp = <optimized out>
        npwp = <optimized out>
        error = <optimized out>
        out_unlock = <optimized out>
#18 0xffffffff80cb4eef in netisr_dispatch (proto=2176435104, proto@entry=6,
    m=0xffffffff8128e307) at /usr/src/sys/net/netisr.c:1242
No locals.
#19 0xffffffff80c9737a in ether_demux (ifp=ifp@entry=0xfffff80117541800,
    m=0xfffff8008a06c300) at /usr/src/sys/net/if_ethersubr.c:942
        i = <optimized out>
        eh = <optimized out>
        ether_type = <optimized out>
        isr = 6
#20 0xffffffff80c98a3d in ether_input_internal (ifp=0xfffff80117541800,
    m=0xfffff8008a06c300) at /usr/src/sys/net/if_ethersubr.c:706
        eh = 0xfffff8008a06c360
        saved_vnet = 0xfffff8000342cb80
        etype = <optimized out>
        evl = <optimized out>
#21 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:736
No locals.
#22 0xffffffff80cb4b64 in netisr_dispatch_src (proto=proto@entry=5,
    source=source@entry=0, m=0xfffff8008a06c300)
    at /usr/src/sys/net/netisr.c:1151
       __pc = 0x0
        __pc = 0x0
        __pc = 0x0
        cpuid = 0
        dosignal = 838860800
        npp = 0xffffffff81cc8038 <netisr_proto+280>
        dispatch_policy = 2166940423
        nwsp = <optimized out>
        npwp = <optimized out>
        error = <optimized out>
        out_unlock = <optimized out>
#23 0xffffffff80cb4eef in netisr_dispatch (proto=2176435104, proto@entry=5,
    m=0xffffffff8128e307) at /usr/src/sys/net/netisr.c:1242
No locals.
#24 0xffffffff80c977c5 in ether_input (ifp=0xfffff80117541800,
    m=0xfffffe015aab74c0) at /usr/src/sys/net/if_ethersubr.c:847
        printedonce = false
        et = {et_link = {tqe_next = 0x0, tqe_prev = 0x0}, et_td = 0x0,
          et_section = {bucket = 0}, et_old_priority = 0 '\000'}
        needs_epoch = <optimized out>
        saved_vnet = 0x0
        mn = 0x0
#25 0xffffffff80d5971f in tcp_lro_flush (lc=lc@entry=0xfffff80117533830,
    le=le@entry=0xfffffe015d88e690) at /usr/src/sys/netinet/tcp_lro.c:1117
No locals.
#26 0xffffffff80d59993 in tcp_lro_rx_done (lc=<optimized out>)
    at /usr/src/sys/netinet/tcp_lro.c:592
        le = 0xfffffe015d88e690
#27 tcp_lro_flush_all (lc=lc@entry=0xfffff80117533830)
    at /usr/src/sys/netinet/tcp_lro.c:1260
        __pc = 0x0
        __pc = 0x0
        saved_vnet = <optimized out>
        seq = <optimized out>
        x = <optimized out>
        nseq = <optimized out>
        done = <optimized out>
#28 0xffffffff80cb0787 in iflib_rxeof (rxq=rxq@entry=0xfffff80117533800,
    budget=<optimized out>) at /usr/src/sys/net/iflib.c:3095
        ri = {iri_qsidx = 0, iri_vtag = 0, iri_len = 86, iri_cidx = 325,
          iri_ifp = 0xfffff80117541800, iri_frags = 0xfffff801175339c0,
          iri_flowid = 911985026, iri_csum_flags = 201326592,
          iri_csum_data = 65535, iri_flags = 0 '\000', iri_nfrags = 1 '\001',
          iri_rsstype = 134 '\206', iri_pad = 0 '\000'}
        ctx = 0xfffff80117546c00
        sctx = 0xffffffff81836340 <igb_sctx_init>
        scctx = 0xfffff80117546ca8
        retval = <optimized out>
        v6_forwarding = <optimized out>
        v4_forwarding = <optimized out>
        lro_possible = <optimized out>
        ifp = 0xfffff80117541800
        mt = <optimized out>
        mh = <optimized out>
        rx_bytes = <optimized out>
        rx_pkts = <optimized out>
        cidxp = 0xfffff8011752aa00
        avail = 0
        saved_vnet = 0x0
        budget_left = <optimized out>
        err = <optimized out>
        m = <optimized out>
        i = <optimized out>
        fl = <optimized out>
        mf = <optimized out>
        lro_enabled = <optimized out>
#29 0xffffffff80caa99a in _task_fn_rx (context=0xfffff80117533800)
    at /usr/src/sys/net/iflib.c:4158
        __pc = 0x0
        work = 0
        rxq = 0xfffff80117533800
        ctx = 0xfffff80117546c00
        nmirq = <optimized out>
        more = <optimized out>
        budget = 29888
#30 0xffffffff80ba62ae in gtaskqueue_run_locked (
    queue=queue@entry=0xfffff800036d6000)
    at /usr/src/sys/kern/subr_gtaskqueue.c:368
        et = {et_link = {tqe_next = 0x0, tqe_prev = 0xfffffe0069f49ad8},
          et_td = 0xfffff80103fb5740, et_section = {bucket = 0},
          et_old_priority = 8 '\b'}
        tb = {tb_running = 0xfffff801175338b0, tb_seq = 3188829, tb_link = {
            le_next = 0x0, le_prev = 0xfffff800036d6010}}
        in_net_epoch = true
        in_net_epoch = true
        gtask = <optimized out>
#31 0xffffffff80ba5fe3 in gtaskqueue_thread_loop (
    arg=arg@entry=0xfffffe0116741008)
    at /usr/src/sys/kern/subr_gtaskqueue.c:544
        tqp = <optimized out>
        tq = 0xfffff800036d6000
#32 0xffffffff80b0b152 in fork_exit (
    callout=0xffffffff80ba5f10 <gtaskqueue_thread_loop>,
    arg=0xfffffe0116741008, frame=0xfffffe015aab7f40)
    at /usr/src/sys/kern/kern_fork.c:1152
        __pc = 0x0
        __pc = 0x0
        td = 0xfffff80103fb5740
        p = 0xffffffff81b9b060 <proc0>
        dtd = <optimized out>
#33 <signal handler called>
No locals.

Comment 58 Michael Tuexen freebsd_committer

2025-04-19 21:32:46 UTC

Created attachment 259712 [details]
Another reproducer

Comment 59 Michael Tuexen freebsd_committer

2025-04-19 22:01:55 UTC

(In reply to Michael Tuexen from comment #58)
Thanks to Kyle I was able to produce a reproducer for another problem which triggers the "sent too much" KASSERT.
Please note that this happens on current and stable/14.

(0)  The TCP connection is established with SACK, without TS, an MSS of 1220,
     and TCP_NODELAY is enabled.
(1)  The application sends 2020 bytes.
(2)  The TCP stacks sends a segment with 1220 bytes and one with 800 bytes.
     Both segments are lost.
(3)  The timer based retransmission of the first segment happens.
     CWND is now 1220.
(4)  When the ACK is received, the second segment is retransmitted.
     CWND is now 2440.
(5)  When the ACK is received, CWND becomes 3240.

This is the precondition.
The important point is that 2 * MSS < CWND < 3 * MSS holds.

Now the following happens:
(6)  The application sends 1018 bytes.
(7)  The TCP stacks sends a segment with 1018 bytes. This segment is lost.
(8)  The application sends 2703 bytes.
(9)  The TCP stack sends a segment with 1220 bytes. 
     The number of bytes not yet sent is 2703 - 1220 = 1483,
     the space left in the congestion window is 3240 - 1018 - 1220 = 1002.
     So no more segments are sent right now.
(10) The segment with 1220 bytes is SACKed.
(11) Therefore, the TCP stack sends out the next 1220 bytes of the remaining
     1483 bytes.
(12) Since the Nagle algorithm is turned off, the remaining 263 bytes are
     also send out, because the congestion control allows it.

The KASSERT triggers, sind in (11) and (12) more than an MSS are sent out.

For me, the sending looks completely valid.
So we need to weaken the KASSERT to cover these cases or remove it at all.

Comment 60 Michael Tuexen freebsd_committer

2025-04-19 23:09:59 UTC

A patch is under review D49922.

Comment 61 Kyle Evans freebsd_committer

2025-04-20 00:34:07 UTC

(In reply to Michael Tuexen from comment #59)

Awesome, thanks for investigating that!

Comment 62 commit-hook freebsd_committer

2025-05-01 16:15:35 UTC

A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=934caaec3afc43638c2a1da8fbe3b672566db4fe

commit 934caaec3afc43638c2a1da8fbe3b672566db4fe
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2025-05-01 16:11:03 +0000
Commit:     Michael Tuexen <tuexen@FreeBSD.org>
CommitDate: 2025-05-01 16:11:03 +0000

    tcp: improve KASSERT in limited retransmit

    When doing a limited retransmit, allow up to 2 * MSS - 1 if the
    Nagle algorithm has been disabled.

    PR:                     282605
    Reviewed by:            cc, Peter Lei
    MFC after:              3 days
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D49922

 sys/netinet/tcp_input.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comment 63 commit-hook freebsd_committer

2025-05-06 19:17:41 UTC

A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=0906658c3409996b26518e67df48c01052ef934c

commit 0906658c3409996b26518e67df48c01052ef934c
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2025-05-01 16:11:03 +0000
Commit:     Michael Tuexen <tuexen@FreeBSD.org>
CommitDate: 2025-05-06 19:15:39 +0000

    tcp: improve KASSERT in limited retransmit

    When doing a limited retransmit, allow up to 2 * MSS - 1 if the
    Nagle algorithm has been disabled.

    PR:                     282605
    Reviewed by:            cc, Peter Lei
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D49922

    (cherry picked from commit 934caaec3afc43638c2a1da8fbe3b672566db4fe)

 sys/netinet/tcp_input.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comment 64 commit-hook freebsd_committer

2025-05-06 21:58:57 UTC

A commit in branch releng/14.3 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=976aa65e3ea48f8864149dcbfc6f912a7ca2b67a

commit 976aa65e3ea48f8864149dcbfc6f912a7ca2b67a
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2025-05-01 16:11:03 +0000
Commit:     Michael Tuexen <tuexen@FreeBSD.org>
CommitDate: 2025-05-06 19:18:24 +0000

    tcp: improve KASSERT in limited retransmit

    When doing a limited retransmit, allow up to 2 * MSS - 1 if the
    Nagle algorithm has been disabled.

    PR:                     282605
    Approved by:            re (cperciva)
    Reviewed by:            cc, Peter Lei
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D49922

    (cherry picked from commit 934caaec3afc43638c2a1da8fbe3b672566db4fe)
    (cherry picked from commit 0906658c3409996b26518e67df48c01052ef934c)

 sys/netinet/tcp_input.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comment 65 Michael Tuexen freebsd_committer

2025-05-13 11:15:30 UTC

Why is the see also link added? Looks like a different problem to me.

Comment 66 Andrey V. Elsukov freebsd_committer

2025-05-13 12:23:09 UTC

(In reply to Michael Tuexen from comment #65)
Yeah, sorry. It looked very similar.

Comment 67 Michael Tuexen freebsd_committer

2025-05-13 15:04:25 UTC

(In reply to Andrey V. Elsukov from comment #66)
Thanks for the quick reaction.

2khramtsov
ae
aehrenberg
avg
che
chs
cperciva
dch
franco
glebius
hobnailed
ivy
kevans
kib
markj
netchild
novel
peterj
rpokala
rscheff
shawn.webb
truckman
tuexen
weh