Bug 260393 - [tcp] Page Fault tcp_output/tcp_input
Summary: [tcp] Page Fault tcp_output/tcp_input
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-net (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-12-13 17:49 UTC by Dobri Dobrev
Modified: 2022-10-12 07:00 UTC (History)
6 users (show)

See Also:


Attachments
core.txt.1 (580.82 KB, text/plain)
2021-12-13 18:08 UTC, Dobri Dobrev
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dobri Dobrev 2021-12-13 17:49:20 UTC
I'm running 2 servers:
Ryzen 7 3800X with Intel(R) X550-T2 - stable/13-n248216-f1d2f22b34a
Xeon E5-1650 v4 with Intel(R) I350 (Copper) - stable/13-n248512-155748c1e75

Both crash with a "Page Fault" messasge, here is output from kgdb:


# kgdb /boot/kernel/kernel /var/crash/vmcore.0
GNU gdb (GDB) 11.1 [GDB v11.1 for FreeBSD]
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd13.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.
 
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...
 
Unread portion of the kernel message buffer:
[193803] 
[193803] 
[193803] Fatal trap 12: page fault while in kernel mode
[193803] cpuid = 0; apic id = 00
[193803] fault virtual address  = 0x8
[193803] fault code             = supervisor read data, page not present
[193803] instruction pointer    = 0x20:0xffffffff80caf078
[193803] stack pointer          = 0x28:0xfffffe017e330850
[193803] frame pointer          = 0x28:0xfffffe017e330890
[193803] code segment           = base rx0, limit 0xfffff, type 0x1b
[193803]                        = DPL 0, pres 1, long 1, def32 0, gran 1
[193803] processor eflags       = interrupt enabled, resume, IOPL = 0
[193803] current process                = 0 (if_io_tqg_0)
[193803] trap number            = 12
[193803] panic: page fault
[193803] cpuid = 0
[193803] time = 1639284248
[193803] KDB: stack backtrace:
[193803] #0 0xffffffff80c60485 at kdb_backtrace+0x65
[193803] #1 0xffffffff80c12cdf at vpanic+0x17f
[193803] #2 0xffffffff80c12b53 at panic+0x43
[193803] #3 0xffffffff810982d5 at trap_fatal+0x385
[193803] #4 0xffffffff8109832f at trap_pfault+0x4f
[193803] #5 0xffffffff8106fae8 at calltrap+0x8
[193803] #6 0xffffffff80caf287 at sbdrop+0x37
[193803] #7 0xffffffff80dcce83 at tcp_do_segment+0x2d93
[193803] #8 0xffffffff80dc93b1 at tcp_input_with_port+0xb61
[193803] #9 0xffffffff80dca05b at tcp_input+0xb
[193803] #10 0xffffffff80dbb82f at ip_input+0x11f
[193803] #11 0xffffffff80d48849 at netisr_dispatch_src+0xb9
[193803] #12 0xffffffff80d2c7d8 at ether_demux+0x138
[193803] #13 0xffffffff80d2db65 at ether_nh_input+0x355
[193803] #14 0xffffffff80d48849 at netisr_dispatch_src+0xb9
[193803] #15 0xffffffff80d2cc09 at ether_input+0x69
[193803] #16 0xffffffff80d44cb7 at iflib_rxeof+0xc27
[193803] #17 0xffffffff80d3f302 at _task_fn_rx+0x72
[193803] Uptime: 2d5h50m3s
[193803] Dumping 5207 out of 130927 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
 
__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb)



And the 2nd one:

# kgdb /boot/kernel/kernel /var/crash/vmcore.3
GNU gdb (GDB) 11.1 [GDB v11.1 for FreeBSD]
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd13.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.
 
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...
 
Unread portion of the kernel message buffer:
IOPL = 0
[149983] current process                = 0 (if_io_tqg_6)
[149983] trap number            = 12
[149983] panic: page fault
[149983] cpuid = 6
[149983] time = 1639293246
[149983] KDB: stack backtrace:
[149983] #0 0xffffffff80c78ac5 at kdb_backtrace+0x65
[149983] #1 0xffffffff80c2a207 at vpanic+0x187
[149983] #2 0xffffffff80c2a073 at panic+0x43
[149983] #3 0xffffffff810b71c7 at trap_fatal+0x387
[149983] #4 0xffffffff810b721f at trap_pfault+0x4f
[149983] #5 0xffffffff810b689a at trap+0x26a
[149983] #6 0xffffffff8108e1b8 at calltrap+0x8
[149983] #7 0xffffffff80deee44 at tcp_output+0x11d4
[149983] #8 0xffffffff80de5fd0 at tcp_do_segment+0x2c00
[149983] #9 0xffffffff80de2702 at tcp_input_with_port+0xb82
[149983] #10 0xffffffff80de333b at tcp_input+0xb
[149983] #11 0xffffffff80dd4bf1 at ip_input+0x121
[149983] #12 0xffffffff80d6276a at netisr_dispatch_src+0xca
[149983] #13 0xffffffff80d467a8 at ether_demux+0x138
[149983] #14 0xffffffff80d47b4e at ether_nh_input+0x34e
[149983] #15 0xffffffff80d6276a at netisr_dispatch_src+0xca
[149983] #16 0xffffffff80d46bf9 at ether_input+0x69
[149983] #17 0xffffffff80d5eea3 at iflib_rxeof+0xc63
[149983] Uptime: 1d17h39m43s
[149983] Dumping 3384 out of 65425 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
 
__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb)
Comment 1 Dobri Dobrev 2021-12-13 18:08:47 UTC
Created attachment 230083 [details]
core.txt.1

Here's the core.txt.1 from the server that was most-recently updated/rebuilt kernel/world (which was yesterday night)
Comment 2 Dobri Dobrev 2021-12-18 06:39:49 UTC
If any additional information is needed - let me know and I'll provide it.
The problem keeps happening once every 2-3 days, constantly, on both servers.
Comment 3 Michael Tuexen freebsd_committer freebsd_triage 2021-12-18 10:28:41 UTC
Is it possible to access the generated core files?
Comment 4 Dobri Dobrev 2021-12-18 12:33:55 UTC
(In reply to Michael Tuexen from comment #3)

Depends.

How do you want to access it? Can I access it for you and provide the necessary output? There is proprietary company software running on these servers, and if the crash dump contains parts/whole of the binaries - I would not be able to provide you with direct access to the dump.

Let me know.
Comment 5 Michael Tuexen freebsd_committer freebsd_triage 2021-12-18 21:28:15 UTC
(In reply to Dobri Dobrev from comment #4)
The kernel dump most likely contains stuff you don't want to share... So could you start kgdb with one of the cores and provide the output of `where`. If you do this for the first one, go up the stack until you are in `sbdrop` and provide also the output of `print *sb`.
Comment 6 Dobri Dobrev 2021-12-19 06:02:07 UTC
(In reply to Michael Tuexen from comment #5)

I updated to stable/13-n248590-b7da472979a, waiting to crash and will check the dump.
Comment 7 Michael Tuexen freebsd_committer freebsd_triage 2021-12-19 11:26:20 UTC
(In reply to Dobri Dobrev from comment #6)
When you are at the debugger, you can type `dump` and `reboot` and the kernel dump should be written to disk. After reboot, you can then use `sudo kgdb -c /var/crash/vmcore.last /boot/kernel/kernel` to start the debugger and we can have a look at it. You can leave and start the debugger multiple times. Just don't update the kernel during that time...
Comment 8 Dobri Dobrev 2021-12-20 07:11:41 UTC
(In reply to Michael Tuexen from comment #7)

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...

Unread portion of the kernel message buffer:
[314230] processor eflags       = interrupt enabled, resume, IOPL = 0
[314230] current process                = 0 (if_io_tqg_1)
[314230] trap number            = 12
[314230] panic: page fault
[314230] cpuid = 1
[314230] time = 1639952536
[314230] KDB: stack backtrace:
[314230] #0 0xffffffff80c60dd5 at kdb_backtrace+0x65
[314230] #1 0xffffffff80c1336f at vpanic+0x17f
[314230] #2 0xffffffff80c131e3 at panic+0x43
[314230] #3 0xffffffff810991b5 at trap_fatal+0x385
[314230] #4 0xffffffff8109920f at trap_pfault+0x4f
[314230] #5 0xffffffff810705e8 at calltrap+0x8
[314230] #6 0xffffffff80dd5fa9 at tcp_output+0x1339
[314230] #7 0xffffffff80dcd382 at tcp_do_segment+0x2902
[314230] #8 0xffffffff80dc9d41 at tcp_input_with_port+0xb61
[314230] #9 0xffffffff80dca9eb at tcp_input+0xb
[314230] #10 0xffffffff80dbc1bf at ip_input+0x11f
[314230] #11 0xffffffff80d491a9 at netisr_dispatch_src+0xb9
[314230] #12 0xffffffff80d2d128 at ether_demux+0x138
[314230] #13 0xffffffff80d2e4b5 at ether_nh_input+0x355
[314230] #14 0xffffffff80d491a9 at netisr_dispatch_src+0xb9
[314230] #15 0xffffffff80d2d559 at ether_input+0x69
[314230] #16 0xffffffff80d45617 at iflib_rxeof+0xc27
[314230] #17 0xffffffff80d3fc62 at _task_fn_rx+0x72
[314230] Uptime: 3d15h17m10s
[314230] Dumping 5461 out of 130927 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) where
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xffffffff80c12f6c in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487
#3  0xffffffff80c133de in vpanic (fmt=0xffffffff81191bdd "%s", ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:920
#4  0xffffffff80c131e3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:844
#5  0xffffffff810991b5 in trap_fatal (frame=0xfffffe00d3bfd5b0, eva=24) at /usr/src/sys/amd64/amd64/trap.c:944
#6  0xffffffff8109920f in trap_pfault (frame=0xfffffe00d3bfd5b0, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:763
#7  <signal handler called>
#8  m_copydata (m=0x0, m@entry=0xfffff8010ee80d00, off=0, len=1, cp=<optimized out>) at /usr/src/sys/kern/uipc_mbuf.c:657
#9  0xffffffff80dd5fa9 in tcp_output (tp=<optimized out>) at /usr/src/sys/netinet/tcp_output.c:1081
#10 0xffffffff80dcd382 in tcp_do_segment (m=<optimized out>, th=<optimized out>, so=<optimized out>, tp=0xfffffe0251638870, drop_hdrlen=40, tlen=<optimized out>, iptos=0 '\000')
    at /usr/src/sys/netinet/tcp_input.c:2822
#11 0xffffffff80dc9d41 in tcp_input_with_port (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>, port=port@entry=0) at /usr/src/sys/netinet/tcp_input.c:1400
#12 0xffffffff80dca9eb in tcp_input (mp=0xfffff8010ee80d00, offp=0x0, proto=1) at /usr/src/sys/netinet/tcp_input.c:1496
#13 0xffffffff80dbc1bf in ip_input (m=0x0) at /usr/src/sys/netinet/ip_input.c:834
#14 0xffffffff80d491a9 in netisr_dispatch_src (proto=1, source=source@entry=0, m=0xfffff8015b58d700) at /usr/src/sys/net/netisr.c:1143
#15 0xffffffff80d4957f in netisr_dispatch (proto=250088704, m=0x1) at /usr/src/sys/net/netisr.c:1234
#16 0xffffffff80d2d128 in ether_demux (ifp=ifp@entry=0xfffff80105343000, m=0x0) at /usr/src/sys/net/if_ethersubr.c:921
#17 0xffffffff80d2e4b5 in ether_input_internal (ifp=0xfffff80105343000, m=0x0) at /usr/src/sys/net/if_ethersubr.c:707
#18 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:737
#19 0xffffffff80d491a9 in netisr_dispatch_src (proto=proto@entry=5, source=source@entry=0, m=m@entry=0xfffff8015b58d700) at /usr/src/sys/net/netisr.c:1143
#20 0xffffffff80d4957f in netisr_dispatch (proto=250088704, proto@entry=5, m=0x1, m@entry=0xfffff8015b58d700) at /usr/src/sys/net/netisr.c:1234
#21 0xffffffff80d2d559 in ether_input (ifp=<optimized out>, m=0xfffff8015b58d700) at /usr/src/sys/net/if_ethersubr.c:828
#22 0xffffffff80d45617 in iflib_rxeof (rxq=<optimized out>, rxq@entry=0xfffffe00d68b6340, budget=<optimized out>) at /usr/src/sys/net/iflib.c:3046
#23 0xffffffff80d3fc62 in _task_fn_rx (context=0xfffffe00d68b6340) at /usr/src/sys/net/iflib.c:3989
#24 0xffffffff80c5f80d in gtaskqueue_run_locked (queue=queue@entry=0xfffff80103920b00) at /usr/src/sys/kern/subr_gtaskqueue.c:371
#25 0xffffffff80c5f482 in gtaskqueue_thread_loop (arg=<optimized out>, arg@entry=0xfffffe00d6bd1020) at /usr/src/sys/kern/subr_gtaskqueue.c:547
#26 0xffffffff80bd053e in fork_exit (callout=0xffffffff80c5f3c0 <gtaskqueue_thread_loop>, arg=0xfffffe00d6bd1020, frame=0xfffffe00d3bfdf40) at /usr/src/sys/kern/kern_fork.c:1092
#27 <signal handler called>
#28 mi_startup () at /usr/src/sys/kern/init_main.c:322
Backtrace stopped: Cannot access memory at address 0xb
(kgdb) print *tcp_output
$1 = {int (struct tcpcb *)} 0xffffffff80dd4c70 <tcp_output>
(kgdb) print *m_copydata
$2 = {void (const struct mbuf *, int, int, caddr_t)} 0xffffffff80ca5bd0 <m_copydata>
(kgdb) print *tcp_do_segment
$3 = {void (struct mbuf *, struct tcphdr *, struct socket *, struct tcpcb *, int, int, uint8_t)} 0xffffffff80dcaa80 <tcp_do_segment>
(kgdb)
Comment 9 Michael Tuexen freebsd_committer freebsd_triage 2021-12-20 10:43:23 UTC
Please run
frame 8
list
print *(struct mbuf *)0xfffff8010ee80d00
frame 10
print *tp
frame 12
print **mp
Comment 10 Dobri Dobrev 2021-12-20 11:11:37 UTC
(In reply to Michael Tuexen from comment #9)

(kgdb) frame 8
#8  m_copydata (m=0x0, m@entry=0xfffff8010ee80d00, off=0, len=1, cp=<optimized out>) at /usr/src/sys/kern/uipc_mbuf.c:657
657                     count = min(m->m_len - off, len);
(kgdb) list
652                     off -= m->m_len;
653                     m = m->m_next;
654             }
655             while (len > 0) {
656                     KASSERT(m != NULL, ("m_copydata, length > size of mbuf chain"));
657                     count = min(m->m_len - off, len);
658                     if ((m->m_flags & M_EXTPG) != 0)
659                             m_copyfromunmapped(m, off, count, cp);
660                     else
661                             bcopy(mtod(m, caddr_t) + off, cp, count);
(kgdb) print *(struct mbuf *)0xfffff8010ee80d00
$1 = {{m_next = 0x0, m_slist = {sle_next = 0x0}, m_stailq = {stqe_next = 0x0}}, {m_nextpkt = 0x0, m_slistpkt = {sle_next = 0x0}, m_stailqpkt = {stqe_next = 0x0}}, 
  m_data = 0xfffff8015b91e528 "&i\365\267\254\350s\352,\025\216*\265\216\004\024\201j\256\245?\225<\020)W\214%\212\371\221$\205s\277LE<\326\340\032\267\377\366\214\217\235\215^)1x\377\342\032\234Ƃ\217]\211\375\333h\361\212\320nE\024\370\330\325S8\272\001y\023\304;\016:\017\032kT5\323\300\f\245MJd\n\025W\352c\321\062)Pl{/\263\320>6\231\362x\305\311\031ö\vy\356&É\265\343;_\273`\272\005\205\315m(\353쁞\001\223\254\371\037]UN\357\202%\201\364\033\r\232G$-N\251\262#\264\204\375\t\321\036\203\241\254\274\314ز\252jŹc.k\217\224#\235\206\241U\262\a\215I\035&\253j3"..., m_len = 24, m_type = 1, m_flags = 1, {{{m_pkthdr = {{snd_tag = 0x0, 
            rcvif = 0x0}, tags = {slh_first = 0x0}, len = 1337, flowid = 0, csum_flags = 0, fibnum = 0, numa_domain = 255 '\377', rsstype = 0 '\000', {rcv_tstmp = 0, {l2hlen = 0 '\000', l3hlen = 0 '\000', 
              l4hlen = 0 '\000', l5hlen = 0 '\000', inner_l2hlen = 0 '\000', inner_l3hlen = 0 '\000', inner_l4hlen = 0 '\000', inner_l5hlen = 0 '\000'}}, PH_per = {eight = "\000\000\000\000\377\377\000", sixteen = {
              0, 0, 65535, 0}, thirtytwo = {0, 65535}, sixtyfour = {281470681743360}, unintptr = {281470681743360}, ptr = 0xffff00000000}, PH_loc = {eight = "\000\000\000\000\000\000\000", sixteen = {0, 0, 0, 0}, 
            thirtytwo = {0, 0}, sixtyfour = {0}, unintptr = {0}, ptr = 0x0}}, {m_epg_npgs = 0 '\000', m_epg_nrdy = 0 '\000', m_epg_hdrlen = 0 '\000', m_epg_trllen = 0 '\000', m_epg_1st_off = 0, m_epg_last_len = 0, 
          m_epg_flags = 0 '\000', m_epg_record_type = 0 '\000', __spare = "\000", m_epg_enc_cnt = 0, m_epg_tls = 0x539, m_epg_so = 0xff000000000000, m_epg_seqno = 0, m_epg_stailq = {stqe_next = 0xffff00000000}}}, {
        m_ext = {{ext_count = 1, ext_cnt = 0x1}, ext_size = 2048, ext_type = 6, ext_flags = 1, {{ext_buf = 0xfffff8015b91e000 "\023\367\265R\030\254\212\342\220\255\331'\206\217\245f\223o\aH\205\277\222", 
              ext_arg2 = 0x0}, {extpg_pa = {18446735283447783424, 0, 0, 0, 0}, extpg_trail = '\000' <repeats 63 times>, extpg_hdr = '\000' <repeats 22 times>}}, ext_free = 0x0, ext_arg1 = 0x0}, 
        m_pktdat = 0xfffff8010ee80d58 "\001"}}, m_dat = 0xfffff8010ee80d20 ""}}
(kgdb) frame 10
#10 0xffffffff80dcd382 in tcp_do_segment (m=<optimized out>, th=<optimized out>, so=<optimized out>, tp=0xfffffe0251638870, drop_hdrlen=40, tlen=<optimized out>, iptos=0 '\000')
    at /usr/src/sys/netinet/tcp_input.c:2822
2822                                                    tcp_sack_partialack(tp, th);
(kgdb) print *tp
$2 = {t_inpcb = 0xfffff80a54294000, t_fb = 0xffffffff8193b000 <tcp_def_funcblk>, t_fb_ptr = 0x0, t_maxseg = 1360, t_logstate = 0, t_port = 0, t_state = 8, t_idle_reduce = 0, t_delayed_ack = 0, t_fin_is_rst = 0, 
  t_log_state_set = 0, bits_spare = 0, t_flags = 554697333, snd_una = 3223852179, snd_max = 3223852205, snd_nxt = 3223852204, snd_up = 3223850831, snd_wnd = 65292, snd_cwnd = 1359, t_peakrate_thr = 0, 
  ts_offset = 0, rfbuf_ts = 313886170, rcv_numsacks = 0, t_tsomax = 65535, t_tsomaxsegcount = 37, t_tsomaxsegsize = 4096, rcv_nxt = 2467824635, rcv_adv = 2467891323, rcv_wnd = 66688, t_flags2 = 1024, t_srtt = 3309, 
  t_rttvar = 287, ts_recent = 0, snd_scale = 2 '\002', rcv_scale = 6 '\006', snd_limited = 0 '\000', request_r_scale = 6 '\006', last_ack_sent = 2467824635, t_rcvtime = 2461112999, rcv_up = 2467824635, 
  t_segqlen = 0, t_segqmbuflen = 0, t_segq = {tqh_first = 0x0, tqh_last = 0xfffffe0251638900}, t_in_pkt = 0x0, t_tail_pkt = 0x0, t_timers = 0xfffffe0251638b18, t_vnet = 0xfffff801014c0580, snd_ssthresh = 2720, 
  snd_wl1 = 2467824635, snd_wl2 = 3223852179, irs = 2467822589, iss = 3223768989, t_acktime = 0, t_sndtime = 2460931776, ts_recent_age = 0, snd_recover = 3223852205, cl4_spare = 0, t_oobflags = 0 '\000', 
  t_iobc = 0 '\000', t_rxtcur = 64000, t_rxtshift = 11, t_rtttime = 0, t_rtseq = 3223852203, t_starttime = 2460765463, t_fbyte_in = 2460765472, t_fbyte_out = 2460765472, t_pmtud_saved_maxseg = 0, 
  t_blackhole_enter = 0, t_blackhole_exit = 0, t_rttmin = 30, t_rttbest = 3596, t_softerror = 0, max_sndwnd = 66640, snd_cwnd_prev = 8160, snd_ssthresh_prev = 2720, snd_recover_prev = 3223823643, t_sndzerowin = 0, 
  t_rttupdated = 9, snd_numholes = 1, t_badrxtwin = 2460781714, snd_holes = {tqh_first = 0xfffff806d12b8780, tqh_last = 0xfffff806d12b8790}, snd_fack = 3223852203, sackblks = {{start = 2467824634, 
      end = 2467824635}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}}, sackhint = {nexthole = 0xfffff806d12b8780, sack_bytes_rexmit = 0, 
    last_sack_ack = 3223852203, delivered_data = 12, sacked_bytes = 0, recover_fs = 1373, prr_delivered = 2722, prr_out = 4105}, t_rttlow = 84, rfbuf_cnt = 0, tod = 0x0, t_sndrexmitpack = 59, t_rcvoopack = 0, 
  t_toe = 0x0, cc_algo = 0xffffffff81937eb0 <newreno_cc_algo>, ccv = 0xfffffe0251638c60, osd = 0xfffffe0251638c88, t_bytes_acked = 0, t_maxunacktime = 0, t_keepinit = 0, t_keepidle = 0, t_keepintvl = 0, 
  t_keepcnt = 0, t_dupacks = 0, t_lognum = 0, t_loglimit = 5000, t_pacing_rate = -1, t_logs = {stqh_first = 0x0, stqh_last = 0xfffffe0251638a88}, t_lin = 0x0, t_lib = 0x0, t_output_caller = 0x0, t_stats = 0x0, 
  t_logsn = 0, gput_ts = 0, gput_seq = 0, gput_ack = 0, t_stats_gput_prev = 0, t_maxpeakrate = 0, t_sndtlppack = 0, t_sndtlpbyte = 0, t_sndbytes = 91397, t_snd_rxt_bytes = 61193, t_tfo_client_cookie_len = 0 '\000', 
  t_end_info_status = 0, t_tfo_pending = 0x0, t_tfo_cookie = {client = '\000' <repeats 15 times>, server = 0}, {t_end_info_bytes = "\000\000\000\000\000\000\000", t_end_info = 0}}
(kgdb) frame 12
#12 0xffffffff80dca9eb in tcp_input (mp=0xfffff8010ee80d00, offp=0x0, proto=1) at /usr/src/sys/netinet/tcp_input.c:1496
1496            return(tcp_input_with_port(mp, offp, proto, 0));
(kgdb) print **mp
Cannot access memory at address 0x0
(kgdb)
Comment 11 Michael Tuexen freebsd_committer freebsd_triage 2021-12-20 11:55:14 UTC
frame 12
print **mp
Comment 12 Michael Tuexen freebsd_committer freebsd_triage 2021-12-20 11:55:32 UTC
(In reply to Michael Tuexen from comment #11)
Sorry, I meant:

frame 12
print *mp
Comment 13 Dobri Dobrev 2021-12-20 11:58:43 UTC
(In reply to Michael Tuexen from comment #12)

(kgdb) frame 12
#12 0xffffffff80dca9eb in tcp_input (mp=0xfffff8010ee80d00, offp=0x0, proto=1) at /usr/src/sys/netinet/tcp_input.c:1496
1496            return(tcp_input_with_port(mp, offp, proto, 0));
(kgdb) print *mp
$3 = (struct mbuf *) 0x0
(kgdb)
Comment 14 Michael Tuexen freebsd_committer freebsd_triage 2021-12-20 19:20:35 UTC
frame 9
print mb
print *mb
print moff
print len
Comment 15 Dobri Dobrev 2021-12-20 19:57:44 UTC
(In reply to Michael Tuexen from comment #14)

(kgdb) frame 9
#9  0xffffffff80dd5fa9 in tcp_output (tp=<optimized out>) at /usr/src/sys/netinet/tcp_output.c:1081
1081                            m_copydata(mb, moff, len,
(kgdb) print mb
$1 = (struct mbuf *) 0xfffff8010ee80d00
(kgdb) print *mb
$2 = {{m_next = 0x0, m_slist = {sle_next = 0x0}, m_stailq = {stqe_next = 0x0}}, {m_nextpkt = 0x0, m_slistpkt = {sle_next = 0x0}, m_stailqpkt = {stqe_next = 0x0}}, 
  m_data = 0xfffff8015b91e528 "&i\365\267\254\350s\352,\025\216*\265\216\004\024\201j\256\245?\225<\020)W\214%\212\371\221$\205s\277LE<\326\340\032\267\377\366\214\217\235\215^)1x\377\342\032\234Ƃ\217]\211\375\333h\361\212\320nE\024\370\330\325S8\272\001y\023\304;\016:\017\032kT5\323\300\f\245MJd\n\025W\352c\321\062)Pl{/\263\320>6\231\362x\305\311\031ö\vy\356&É\265\343;_\273`\272\005\205\315m(\353쁞\001\223\254\371\037]UN\357\202%\201\364\033\r\232G$-N\251\262#\264\204\375\t\321\036\203\241\254\274\314ز\252jŹc.k\217\224#\235\206\241U\262\a\215I\035&\253j3"..., m_len = 24, m_type = 1, m_flags = 1, {{{m_pkthdr = {{snd_tag = 0x0, 
            rcvif = 0x0}, tags = {slh_first = 0x0}, len = 1337, flowid = 0, csum_flags = 0, fibnum = 0, numa_domain = 255 '\377', rsstype = 0 '\000', {rcv_tstmp = 0, {l2hlen = 0 '\000', l3hlen = 0 '\000', 
              l4hlen = 0 '\000', l5hlen = 0 '\000', inner_l2hlen = 0 '\000', inner_l3hlen = 0 '\000', inner_l4hlen = 0 '\000', inner_l5hlen = 0 '\000'}}, PH_per = {eight = "\000\000\000\000\377\377\000", sixteen = {
              0, 0, 65535, 0}, thirtytwo = {0, 65535}, sixtyfour = {281470681743360}, unintptr = {281470681743360}, ptr = 0xffff00000000}, PH_loc = {eight = "\000\000\000\000\000\000\000", sixteen = {0, 0, 0, 0}, 
            thirtytwo = {0, 0}, sixtyfour = {0}, unintptr = {0}, ptr = 0x0}}, {m_epg_npgs = 0 '\000', m_epg_nrdy = 0 '\000', m_epg_hdrlen = 0 '\000', m_epg_trllen = 0 '\000', m_epg_1st_off = 0, m_epg_last_len = 0, 
          m_epg_flags = 0 '\000', m_epg_record_type = 0 '\000', __spare = "\000", m_epg_enc_cnt = 0, m_epg_tls = 0x539, m_epg_so = 0xff000000000000, m_epg_seqno = 0, m_epg_stailq = {stqe_next = 0xffff00000000}}}, {
        m_ext = {{ext_count = 1, ext_cnt = 0x1}, ext_size = 2048, ext_type = 6, ext_flags = 1, {{ext_buf = 0xfffff8015b91e000 "\023\367\265R\030\254\212\342\220\255\331'\206\217\245f\223o\aH\205\277\222", 
              ext_arg2 = 0x0}, {extpg_pa = {18446735283447783424, 0, 0, 0, 0}, extpg_trail = '\000' <repeats 63 times>, extpg_hdr = '\000' <repeats 22 times>}}, ext_free = 0x0, ext_arg1 = 0x0}, 
        m_pktdat = 0xfffff8010ee80d58 "\001"}}, m_dat = 0xfffff8010ee80d20 ""}}
(kgdb) print moff
$3 = 0
(kgdb) print len
$4 = 1
(kgdb)
Comment 16 Michael Tuexen freebsd_committer freebsd_triage 2021-12-20 22:01:47 UTC
frame 8
print count
print m
print off
print len
Comment 17 Michael Tuexen freebsd_committer freebsd_triage 2021-12-20 22:02:04 UTC
frame 8
print count
print m
print off
print len
Comment 18 Dobri Dobrev 2021-12-21 06:25:36 UTC
(In reply to Michael Tuexen from comment #17)

(kgdb) frame 8
#8  m_copydata (m=0x0, m@entry=0xfffff8010ee80d00, off=0, len=1, cp=<optimized out>) at /usr/src/sys/kern/uipc_mbuf.c:657
657                     count = min(m->m_len - off, len);
(kgdb) print count
$5 = <optimized out>
(kgdb) print m
$6 = (const struct mbuf *) 0x0
(kgdb) print off
$7 = 0
(kgdb) print len
$8 = 1
(kgdb)
Comment 19 Dobri Dobrev 2021-12-21 19:36:59 UTC
(In reply to Michael Tuexen from comment #17)

If something more is needed - I'll provide as quickly as possible.
Comment 20 Michael Tuexen freebsd_committer freebsd_triage 2021-12-21 21:00:49 UTC
(In reply to Dobri Dobrev from comment #19)
Thanks. Right now I'm trying to figure out what could be going on.

Are you using anything non-default? Alternate CC module? Alternate stack? Are you using long lived TCP connections or short lived? High bandwidth? Any hint you can provide?
Comment 21 Dobri Dobrev 2021-12-21 21:11:45 UTC
(In reply to Michael Tuexen from comment #20)

I'm using the exact same settings on 12.2 w/o problems.

Most of the loader/sysctl are calomel defaults with maybe 2-3 settings changed in total.

Traffic is 10~50 mbit RX, 120~220 mbit TX.

PF enabled

Nginx 1.21.4 running with:
sendfile off
tcp_nopush off
keepalive_timeout 10s / 60s / 600s (different on different "servers")
keepalive_requests 50
websocket connections on some servers

The same configuration runs in 12.2 w/o any issues, current uptime ~67 days there.

The problem only appears on 13/stable (Haven't tried 13.0-release due to some things implemented in stable that haven't made their way to 13.0-rel).


Do you need anything else from the dump, or perhaps to test something?
Comment 22 Dobri Dobrev 2021-12-21 21:17:22 UTC
I'm only having 1 patch to pf_table.c (3-4 lines in total, nothing major) that also works w/o any issues on 12.2, and from what I can see in the dump - pf is not related to the crash.

I'll be willing to test 13.0-release, if needed, just need to apply that patch & rebuild the kernel.
Comment 23 Michael Tuexen freebsd_committer freebsd_triage 2021-12-21 21:32:02 UTC
(In reply to Dobri Dobrev from comment #22)
I'm not interested in testing 13-release. But I would be interested if the problem also shows up if you don't use pf. Is that possible?
Comment 24 Dobri Dobrev 2021-12-21 21:35:00 UTC
(In reply to Michael Tuexen from comment #23)

Unfortunately PF is essential, I cannot disable it.

To answer a question you might be having - yes, I tested without the PF patch before I made the bug submission.
Comment 25 Michael Tuexen freebsd_committer freebsd_triage 2021-12-21 21:40:40 UTC
(In reply to Dobri Dobrev from comment #24)
The point I had in mind was to exclude pf from the system to be sure it is a TCP problem. But that does not seem to be possible. Thanks for the feedback. Will ask if I need more information...
Comment 26 Dobri Dobrev 2021-12-21 21:42:57 UTC
(In reply to Michael Tuexen from comment #25)

If you wish, I can test an earlier revision of 13/stable, before changes to these files (noticed you did some commits changing several of the files listed in the crash).

Let me know which revision I should pull.
Comment 27 Michael Tuexen freebsd_committer freebsd_triage 2021-12-21 21:47:01 UTC
(In reply to Dobri Dobrev from comment #26)
I actually don't know which version you should try. But you might pick some older version, give it a try and to some binary search. It would help to know which commit introduced the problem...
Comment 28 Dobri Dobrev 2021-12-21 21:48:25 UTC
(In reply to Michael Tuexen from comment #27)

I'll try to do that sometime tomorrow, and let you know.
Comment 29 Michael Tuexen freebsd_committer freebsd_triage 2021-12-21 21:50:13 UTC
Great, thanks. How long does it take for a machine to panic?
Comment 30 Dobri Dobrev 2021-12-21 22:28:31 UTC
(In reply to Michael Tuexen from comment #29)

Few days.
Comment 31 Dobri Dobrev 2021-12-21 22:43:04 UTC
(In reply to Michael Tuexen from comment #29)

I'm building d04c12765cfa2bf0f33f7489d48843648073ce06, will test it for few days.
Comment 32 Michael Tuexen freebsd_committer freebsd_triage 2021-12-21 23:09:43 UTC
(In reply to Dobri Dobrev from comment #31)
Are you using a kernel build with INVARIANTS? If not, you might want to do that first. Maybe that gives a hint, because it might panic sooner...
Comment 33 Dobri Dobrev 2021-12-21 23:11:34 UTC
(In reply to Michael Tuexen from comment #32)

Let me know what "INVARIANTS" are and how to build the kernel with it.

I'm building the "GENERIC" config.
Comment 34 Michael Tuexen freebsd_committer freebsd_triage 2021-12-21 23:25:32 UTC
Add to the kernel config file GENERIC

options 	BUF_TRACKING		# Track buffer history
options 	DDB			# Support DDB.
options 	FULL_BUF_TRACKING	# Track more buffer history
options 	GDB			# Support remote GDB.
options 	DEADLKRES		# Enable the deadlock resolver
options 	INVARIANTS		# Enable calls of extra sanity checking
options 	INVARIANT_SUPPORT	# Extra sanity checks of internal structures, required by INVARIANTS
options 	QUEUE_MACRO_DEBUG_TRASH	# Trash queue(2) internal pointers on invalidation
options 	WITNESS			# Enable checks to detect deadlocks and cycles
options 	WITNESS_SKIPSPIN	# Don't run witness on spinlocks for speed
options 	MALLOC_DEBUG_MAXZONES=8	# Separate malloc(9) zones
options 	VERBOSE_SYSINIT=0	# Support debug.verbose_sysinit, off by default

and rebuild the kernel.
Comment 35 Dobri Dobrev 2021-12-21 23:27:40 UTC
(In reply to Michael Tuexen from comment #34)

I'll rebuild the kernel with these when I find which commit actually causes the problem (I suspect these options will slow down the system somewhat)
Comment 36 Hans Petter Selasky freebsd_committer freebsd_triage 2021-12-21 23:40:26 UTC
Hi,

I wonder if we need to subtract 1 from tp->snd_max, when TF_SENTFIN is set?

t_state = 8 

#define TCPS_LAST_ACK           8       /* had fin and close; await FIN ACK */

t_flags = 554697333 = 0x21100275

#define  TF_SENTFIN      0x00000010      /* have sent FIN */

I remember we did a similar fix some while back for SACK:

                /*
                 * Exclude FIN sequence space in
                 * the hole for the rescue retransmission,
                 * and also don't create a hole, if only
                 * the ACK for a FIN is outstanding.
                 */
                tcp_seq highdata = tp->snd_max;
                if (tp->t_flags & TF_SENTFIN)
                        highdata--;


Now in this piece of code leading up do the sbdrop() of 1 byte:

                if (tlen == 0) {
                        if (SEQ_GT(th->th_ack, tp->snd_una) &&
                            SEQ_LEQ(th->th_ack, tp->snd_max) &&
                            !IN_RECOVERY(tp->t_flags) &&
                            (to.to_flags & TOF_SACK) == 0 &&
                            TAILQ_EMPTY(&tp->snd_holes)) {

The SEQ_LEQ is compared against the wrong snd_max ?

       SEQ_LEQ(th->th_ack, tp->snd_max)

--HPS
Comment 37 Hans Petter Selasky freebsd_committer freebsd_triage 2021-12-21 23:47:13 UTC
And similarly:

acked = BYTES_THIS_ACK(tp, th);
if (SENTFIN)
acked--;

???
Comment 38 Dobri Dobrev 2021-12-21 23:51:43 UTC
(In reply to Hans Petter Selasky from comment #37)

The thing is... when did something in regards to this got changed?
Had to have happened between 12.2 and 13 at some point.. if we can find the actual commit quickly, we can test..
Comment 39 Dobri Dobrev 2021-12-22 01:01:41 UTC
(In reply to Michael Tuexen from comment #34)
for some reason I'm unable to build world/kernel on older revisions due to "ld: error: /usr/obj/usr/src/amd64.amd64/tmp/lib/libc.so.7: undefined reference to __sys_pdfork [--no-allow-shlib-undefined]"

No idea how to solve it.
Comment 40 Hans Petter Selasky freebsd_committer freebsd_triage 2021-12-22 08:36:59 UTC
Might be you need to:

make toolchain

first.

--HPS
Comment 41 Hans Petter Selasky freebsd_committer freebsd_triage 2021-12-22 12:41:09 UTC
I note that in order to reach the sbdrop() where we panic happens we need to pass:

if (tp->t_state == TCPS_ESTABLISHED)

But:

tp->t_state = 8 (TCPS_LAST_ACK)

So that means there is a race somewhere.

--HPS
Comment 42 Hans Petter Selasky freebsd_committer freebsd_triage 2021-12-22 12:46:27 UTC
Could you dump the INPCB aswell:

print /x *tp->t_inpcb
Comment 43 Dobri Dobrev 2021-12-22 12:59:08 UTC
(In reply to Hans Petter Selasky from comment #42)

From which frame ?
Comment 44 Hans Petter Selasky freebsd_committer freebsd_triage 2021-12-22 13:29:46 UTC
(kgdb) frame 8
Comment 45 Dobri Dobrev 2021-12-22 13:55:41 UTC
(In reply to Hans Petter Selasky from comment #44)

(kgdb) where
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xffffffff80c12f6c in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487
#3  0xffffffff80c133de in vpanic (fmt=0xffffffff81191bdd "%s", ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:920
#4  0xffffffff80c131e3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:844
#5  0xffffffff810991b5 in trap_fatal (frame=0xfffffe00d3bfd5b0, eva=24) at /usr/src/sys/amd64/amd64/trap.c:944
#6  0xffffffff8109920f in trap_pfault (frame=0xfffffe00d3bfd5b0, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:763
#7  <signal handler called>
#8  m_copydata (m=0x0, m@entry=0xfffff8010ee80d00, off=0, len=1, cp=<optimized out>) at /usr/src/sys/kern/uipc_mbuf.c:657
#9  0xffffffff80dd5fa9 in tcp_output (tp=<optimized out>) at /usr/src/sys/netinet/tcp_output.c:1081
#10 0xffffffff80dcd382 in tcp_do_segment (m=<optimized out>, th=<optimized out>, so=<optimized out>, tp=0xfffffe0251638870, drop_hdrlen=40, tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2822
#11 0xffffffff80dc9d41 in tcp_input_with_port (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>, port=port@entry=0) at /usr/src/sys/netinet/tcp_input.c:1400
#12 0xffffffff80dca9eb in tcp_input (mp=0xfffff8010ee80d00, offp=0x0, proto=1) at /usr/src/sys/netinet/tcp_input.c:1496
#13 0xffffffff80dbc1bf in ip_input (m=0x0) at /usr/src/sys/netinet/ip_input.c:834
#14 0xffffffff80d491a9 in netisr_dispatch_src (proto=1, source=source@entry=0, m=0xfffff8015b58d700) at /usr/src/sys/net/netisr.c:1143
#15 0xffffffff80d4957f in netisr_dispatch (proto=250088704, m=0x1) at /usr/src/sys/net/netisr.c:1234
#16 0xffffffff80d2d128 in ether_demux (ifp=ifp@entry=0xfffff80105343000, m=0x0) at /usr/src/sys/net/if_ethersubr.c:921
#17 0xffffffff80d2e4b5 in ether_input_internal (ifp=0xfffff80105343000, m=0x0) at /usr/src/sys/net/if_ethersubr.c:707
#18 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:737
#19 0xffffffff80d491a9 in netisr_dispatch_src (proto=proto@entry=5, source=source@entry=0, m=m@entry=0xfffff8015b58d700) at /usr/src/sys/net/netisr.c:1143
#20 0xffffffff80d4957f in netisr_dispatch (proto=250088704, proto@entry=5, m=0x1, m@entry=0xfffff8015b58d700) at /usr/src/sys/net/netisr.c:1234
#21 0xffffffff80d2d559 in ether_input (ifp=<optimized out>, m=0xfffff8015b58d700) at /usr/src/sys/net/if_ethersubr.c:828
#22 0xffffffff80d45617 in iflib_rxeof (rxq=<optimized out>, rxq@entry=0xfffffe00d68b6340, budget=<optimized out>) at /usr/src/sys/net/iflib.c:3046
#23 0xffffffff80d3fc62 in _task_fn_rx (context=0xfffffe00d68b6340) at /usr/src/sys/net/iflib.c:3989
#24 0xffffffff80c5f80d in gtaskqueue_run_locked (queue=queue@entry=0xfffff80103920b00) at /usr/src/sys/kern/subr_gtaskqueue.c:371
#25 0xffffffff80c5f482 in gtaskqueue_thread_loop (arg=<optimized out>, arg@entry=0xfffffe00d6bd1020) at /usr/src/sys/kern/subr_gtaskqueue.c:547
#26 0xffffffff80bd053e in fork_exit (callout=0xffffffff80c5f3c0 <gtaskqueue_thread_loop>, arg=0xfffffe00d6bd1020, frame=0xfffffe00d3bfdf40) at /usr/src/sys/kern/kern_fork.c:1092
#27 <signal handler called>
#28 mi_startup () at /usr/src/sys/kern/init_main.c:322
Backtrace stopped: Cannot access memory at address 0xb
(kgdb) frame 8
#8  m_copydata (m=0x0, m@entry=0xfffff8010ee80d00, off=0, len=1, cp=<optimized out>) at /usr/src/sys/kern/uipc_mbuf.c:657
warning: Source file is more recent than executable.
657                     count = min(m->m_len - off, len);
(kgdb) print /x *tp->t_inpcb
No symbol "tp" in current context.
(kgdb) print /x tp->t_inpcb
No symbol "tp" in current context.
(kgdb) 


There doesn't appear to be "tp" in frame 8 ...
Comment 46 Dobri Dobrev 2021-12-22 13:58:04 UTC
(In reply to Hans Petter Selasky from comment #44)
Here is from frame 9:
(kgdb) frame 9
#9  0xffffffff80dd5fa9 in tcp_output (tp=<optimized out>) at /usr/src/sys/netinet/tcp_output.c:1081
warning: Source file is more recent than executable.
1081                            m_copydata(mb, moff, len,
(kgdb) print /x *tp->t_inpcb
value has been optimized out
(kgdb) print /x tp->t_inpcb
value has been optimized out
(kgdb)
Comment 47 Hans Petter Selasky freebsd_committer freebsd_triage 2021-12-22 13:58:24 UTC
Try instead:

frame 10
print /x *tp->t_inpcb
Comment 48 Dobri Dobrev 2021-12-22 13:59:39 UTC
(In reply to Hans Petter Selasky from comment #47)

Here is from frame 10:

(kgdb)  frame 10
#10 0xffffffff80dcd382 in tcp_do_segment (m=<optimized out>, th=<optimized out>, so=<optimized out>, tp=0xfffffe0251638870, drop_hdrlen=40, tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2822
warning: Source file is more recent than executable.
2822                                                    tcp_sack_partialack(tp, th);
(kgdb) print /x *tp->t_inpcb
$1 = {inp_hash = {cle_next = 0x0, cle_prev = 0xfffffe02092fde90}, inp_pcbgrouphash = {cle_next = 0x0, cle_prev = 0x0}, inp_lock = {lock_object = {lo_name = 0xffffffff8117b820, lo_flags = 0x56b0000, lo_data = 0x0, lo_witness = 0x0}, 
    rw_lock = 0xfffffe00d6bd4560}, inp_hpts = {tqe_next = 0x0, tqe_prev = 0x0}, inp_hpts_request = 0x0, inp_in_hpts = 0x0, inp_in_input = 0x0, inp_hpts_cpu = 0x0, inp_irq_cpu = 0x0, inp_refcount = 0x2, inp_flags = 0x8802000, 
  inp_flags2 = 0x0, inp_input_cpu = 0x0, inp_hpts_cpu_set = 0x0, inp_input_cpu_set = 0x0, inp_hpts_calls = 0x0, inp_input_calls = 0x0, inp_irq_cpu_set = 0x0, inp_spare_bits2 = 0x0, inp_numa_domain = 0xff, inp_ppcb = 0xfffffe0251638870, 
  inp_socket = 0xfffff8010ef223b0, inp_hptsslot = 0x0, inp_hpts_drop_reas = 0x0, inp_input = {tqe_next = 0x0, tqe_prev = 0x0}, inp_pcbinfo = 0xfffffe00d6a89758, inp_pcbgroup = 0x0, inp_pcbgroup_wild = {cle_next = 0x0, cle_prev = 0x0}, 
  inp_cred = 0xfffff80103fa9500, inp_flow = 0x0, inp_vflag = 0x1, inp_ip_ttl = 0x40, inp_ip_p = 0x0, inp_ip_minttl = 0x0, inp_flowid = 0x73b2783d, inp_snd_tag = 0x0, inp_flowtype = 0x82, inp_rss_listen_bucket = 0x0, inp_inc = {
    inc_flags = 0x0, inc_len = 0x0, inc_fibnum = 0x1, inc_ie = {ie_fport = 0x49c2, ie_lport = 0xf710, ie_dependfaddr = {id46_addr = {ia46_pad32 = {0x0, 0x0, 0x0}, ia46_addr4 = {s_addr = 0xd6a971c5}}, id6_addr = {__u6_addr = {
            __u6_addr8 = {0x0 <repeats 12 times>, 0xc5, 0x71, 0xa9, 0xd6}, __u6_addr16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x71c5, 0xd6a9}, __u6_addr32 = {0x0, 0x0, 0x0, 0xd6a971c5}}}}, ie_dependladdr = {id46_addr = {ia46_pad32 = {0x0, 
            0x0, 0x0}, ia46_addr4 = {s_addr = 0xd011ca95}}, id6_addr = {__u6_addr = {__u6_addr8 = {0x0 <repeats 12 times>, 0x95, 0xca, 0x11, 0xd0}, __u6_addr16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xca95, 0xd011}, __u6_addr32 = {0x0, 0x0, 
              0x0, 0xd011ca95}}}}, ie6_zoneid = 0x0}}, inp_label = 0x0, inp_sp = 0xfffff8084f4d5a20, {inp_ip_tos = 0x0, inp_options = 0x0, inp_moptions = 0x0}, {in6p_options = 0x0, in6p_outputopts = 0x0, in6p_moptions = 0x0, 
    in6p_icmp6filt = 0x0, in6p_cksum = 0x0, in6p_hops = 0x0}, inp_portlist = {cle_next = 0xfffff80bfc660d90, cle_prev = 0xfffff8080f614d00}, inp_phd = 0xfffff80105455c40, inp_gencnt = 0xc6f8d0f, spare_ptr = 0x0, inp_rt_cookie = 0x63, {
    inp_route = {ro_nh = 0xfffff8010e7a5e00, ro_lle = 0xfffff8015b783000, ro_prepend = 0x0, ro_plen = 0x0, ro_flags = 0x180, ro_mtu = 0x0, spare = 0x0, ro_dst = {sa_len = 0x10, sa_family = 0x2, sa_data = {0x0, 0x0, 0xc5, 0x71, 0xa9, 
          0xd6, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}}, inp_route6 = {ro_nh = 0xfffff8010e7a5e00, ro_lle = 0xfffff8015b783000, ro_prepend = 0x0, ro_plen = 0x0, ro_flags = 0x180, ro_mtu = 0x0, spare = 0x0, ro_dst = {sin6_len = 0x10, 
        sin6_family = 0x2, sin6_port = 0x0, sin6_flowinfo = 0xd6a971c5, sin6_addr = {__u6_addr = {__u6_addr8 = {0x0 <repeats 16 times>}, __u6_addr16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, __u6_addr32 = {0x0, 0x0, 0x0, 0x0}}}, 
        sin6_scope_id = 0x0}}}, inp_list = {cle_next = 0xfffff8015ba7dd90, cle_prev = 0xfffff8080f614d70}, inp_epoch_ctx = {data = {0x0, 0x0}}}
(kgdb)
Comment 49 Hans Petter Selasky freebsd_committer freebsd_triage 2021-12-22 14:05:31 UTC
Then see if you can get this working:

frame 10
print /x *(struct thread *)tp->t_inpcb.inp_lock.rw_lock
Comment 50 Hans Petter Selasky freebsd_committer freebsd_triage 2021-12-22 14:06:09 UTC
Then see if you can get this working:

frame 10
print /x *(struct thread *)(tp->t_inpcb->inp_lock.rw_lock)
Comment 51 Dobri Dobrev 2021-12-22 14:48:59 UTC
(In reply to Hans Petter Selasky from comment #50)

(kgdb) frame 10
#10 0xffffffff80dcd382 in tcp_do_segment (m=<optimized out>, th=<optimized out>, so=<optimized out>, tp=0xfffffe0251638870, drop_hdrlen=40, tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2822
2822                                                    tcp_sack_partialack(tp, th);
(kgdb) print /x *(struct thread *)tp->t_inpcb.inp_lock.rw_lock
$3 = {td_lock = 0xfffffe00d68af0c0, td_proc = 0xffffffff81c8bea8, td_plist = {tqe_next = 0xfffffe00d6bd3e40, tqe_prev = 0xfffffe00d6bd4c90}, td_runq = {tqe_next = 0x0, tqe_prev = 0xfffffe00d68af190}, {td_slpq = {tqe_next = 0x0, 
      tqe_prev = 0xfffff801014b7700}, td_zombie = 0x0}, td_lockq = {tqe_next = 0x0, tqe_prev = 0xfffffe020cdd8bf8}, td_hash = {le_next = 0x0, le_prev = 0xfffffe00d6b08550}, td_cpuset = 0xfffff8010396f180, td_domain = {
    dr_policy = 0xffffffff818010b8, dr_iter = 0x0}, td_sel = 0x0, td_sleepqueue = 0xfffff801014b7700, td_turnstile = 0xfffff8015b64a300, td_rlqe = 0x0, td_umtxq = 0xfffff8010392b000, td_tid = 0x186aa, td_sigqueue = {sq_signals = {
      __bits = {0x0, 0x0, 0x0, 0x0}}, sq_kill = {__bits = {0x0, 0x0, 0x0, 0x0}}, sq_ptrace = {__bits = {0x0, 0x0, 0x0, 0x0}}, sq_list = {tqh_first = 0x0, tqh_last = 0xfffffe00d6bd4638}, sq_proc = 0xffffffff81c8bea8, sq_flags = 0x1}, 
  td_lend_user_pri = 0xff, td_allocdomain = 0x0, td_flags = 0x4010006, td_inhibitors = 0x0, td_pflags = 0x200000, td_pflags2 = 0x0, td_dupfd = 0x0, td_sqqueue = 0x0, td_wchan = 0x0, td_wmesg = 0x0, td_owepreempt = 0x0, td_tsqueue = 0x0, 
  td_locks = 0x0, td_rw_rlocks = 0x0, td_sx_slocks = 0x0, td_lk_slocks = 0x0, td_stopsched = 0x1, td_blocked = 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0}, td_sleeplocks = 0x0, td_intr_nesting_level = 0x0, td_pinned = 0x3, 
  td_realucred = 0xfffff801015fd800, td_ucred = 0xfffff801015fd800, td_limit = 0xfffff801015fd700, td_slptick = 0x0, td_blktick = 0x0, td_swvoltick = 0x92b19aa5, td_swinvoltick = 0x8a9cc00b, td_cow = 0x0, td_ru = {ru_utime = {
      tv_sec = 0x0, tv_usec = 0x0}, ru_stime = {tv_sec = 0x0, tv_usec = 0x0}, ru_maxrss = 0x0, ru_ixrss = 0x0, ru_idrss = 0x0, ru_isrss = 0x0, ru_minflt = 0x0, ru_majflt = 0x0, ru_nswap = 0x0, ru_inblock = 0x0, ru_oublock = 0x0, 
    ru_msgsnd = 0x0, ru_msgrcv = 0x0, ru_nsignals = 0x0, ru_nvcsw = 0x1a6a5356, ru_nivcsw = 0x3}, td_rux = {rux_runtime = 0x63a4695bd17, rux_uticks = 0x0, rux_sticks = 0x3d50f, rux_iticks = 0x0, rux_uu = 0x0, rux_su = 0x715e57c6, 
    rux_tu = 0x715e57c6}, td_incruntime = 0x807dd793, td_runtime = 0x63ac7110c3e, td_pticks = 0x3d55b, td_sticks = 0x4c, td_iticks = 0x0, td_uticks = 0x0, td_intrval = 0x0, td_oldsigmask = {__bits = {0x0, 0x0, 0x0, 0x0}}, 
  td_generation = 0x1a6a5359, td_sigstk = {ss_sp = 0x0, ss_size = 0x0, ss_flags = 0x0}, td_xsig = 0x0, td_profil_addr = 0x0, td_profil_ticks = 0x0, td_name = {0x69, 0x66, 0x5f, 0x69, 0x6f, 0x5f, 0x74, 0x71, 0x67, 0x5f, 0x31, 0x0, 0x0, 
    0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, td_fpop = 0x0, td_dbgflags = 0x0, td_si = {si_signo = 0x0, si_errno = 0x0, si_code = 0x0, si_pid = 0x0, si_uid = 0x0, si_status = 0x0, si_addr = 0x0, si_value = {sival_int = 0x0, sival_ptr = 0x0, 
      sigval_int = 0x0, sigval_ptr = 0x0}, _reason = {_fault = {_trapno = 0x0}, _timer = {_timerid = 0x0, _overrun = 0x0}, _mesgq = {_mqd = 0x0}, _poll = {_band = 0x0}, __spare__ = {__spare1__ = 0x0, __spare2__ = {0x0, 0x0, 0x0, 0x0, 
          0x0, 0x0, 0x0}}}}, td_ng_outbound = 0x0, td_osd = {osd_nslots = 0x0, osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}}, td_map_def_user = 0x0, td_dbg_forked = 0x0, td_vp_reserved = 0x0, td_no_sleeping = 0x1, 
  td_su = 0x0, td_sleeptimo = 0x0, td_rtcgen = 0x0, td_errno = 0x0, td_vslock_sz = 0x0, td_kcov_info = 0x0, td_ucredref = 0x0, td_sigmask = {__bits = {0x0, 0x0, 0x0, 0x0}}, td_rqindex = 0x6, td_base_pri = 0x18, td_priority = 0x18, 
  td_pri_class = 0x3, td_user_pri = 0x7f, td_base_user_pri = 0x7f, td_unused_0 = 0x0, td_rb_list = 0x0, td_rbp_list = 0x0, td_rb_inact = 0x0, td_sa = {code = 0x0, callp = 0x0, args = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}, 
  td_sigblock_ptr = 0x0, td_sigblock_val = 0x0, td_pcb = 0xfffffe00d6bd4a70, td_state = 0x4, td_uretoff = {tdu_retval = {0x0, 0x0}, tdu_off = 0x0}, td_cowgen = 0x0, td_slpcallout = {c_links = {le = {le_next = 0x0, le_prev = 0x0}, sle = {
        sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0x0, c_precision = 0x0, c_arg = 0x0, c_func = 0x0, c_lock = 0x0, c_flags = 0x0, c_iflags = 0x10, c_cpu = 0x0}, td_frame = 0xfffffe00d3bfdf40, 
  td_kstack = 0xfffffe00d3bfa000, td_kstack_pages = 0x4, td_critnest = 0x1, td_md = {md_spinlock_count = 0x1, md_saved_flags = 0x246, md_spurflt_addr = 0x0, md_invl_gen = {gen = 0x0, {link = {le_next = 0x1, le_prev = 0x0}, {next = 0x1, 
          saved_pri = 0x0}}}, md_efirt_tmp = 0x0, md_efirt_dis_pf = 0x0, md_pcb = {pcb_r15 = 0xffffffff81cde1c8, pcb_r14 = 0xfffffe00d6b53c80, pcb_r13 = 0xfffffe00d6bd4560, pcb_r12 = 0xfffffe00d3bfddb8, pcb_rbp = 0xfffffe00d3bfde50, 
      pcb_rsp = 0xfffffe00d3bfdda8, pcb_rbx = 0xfffffe00d68af0c0, pcb_rip = 0xffffffff80c45a59, pcb_fsbase = 0x0, pcb_gsbase = 0x0, pcb_kgsbase = 0x0, pcb_cr0 = 0x0, pcb_cr2 = 0x0, pcb_cr3 = 0x0, pcb_cr4 = 0x0, pcb_dr0 = 0x0, 
      pcb_dr1 = 0x0, pcb_dr2 = 0x0, pcb_dr3 = 0x0, pcb_dr6 = 0x0, pcb_dr7 = 0x0, pcb_gdt = {rd_limit = 0x0, rd_base = 0x0}, pcb_idt = {rd_limit = 0x0, rd_base = 0x0}, pcb_ldt = {rd_limit = 0x0, rd_base = 0x0}, pcb_tr = 0x0, 
      pcb_flags = 0x1, pcb_initial_fpucw = 0x0, pcb_onfault = 0x0, pcb_saved_ucr3 = 0x0, pcb_tssp = 0x0, pcb_efer = 0x0, pcb_star = 0x0, pcb_lstar = 0x0, pcb_cstar = 0x0, pcb_sfmask = 0x0, pcb_save = 0xfffffe00d6a6ed00, pcb_pad = {0x0, 
        0x0, 0x0, 0x0, 0x0}}, md_stack_base = 0xfffffe00d3bfe000, md_usr_fpu_save = 0xfffffe00d6a6ed00}, td_ar = 0x0, td_lprof = {{lh_first = 0x0}, {lh_first = 0x0}}, td_dtrace = 0xfffff80103920a00, td_vnet = 0xfffff801014c0580, 
  td_vnet_lpush = 0x0, td_intr_frame = 0x0, td_rfppwait_p = 0x0, td_ma = 0x0, td_ma_cnt = 0x0, td_emuldata = 0x0, td_lastcpu = 0x1, td_oncpu = 0x1, td_lkpi_task = 0x0, td_pmcpend = 0x0, td_coredump = 0x0, td_ktr_io_lim = 0x0}
(kgdb)
Comment 52 Hans Petter Selasky freebsd_committer freebsd_triage 2021-12-22 15:12:11 UTC
Hi,

There appears to be multiple dumps with different issues!

Decoding the thread name from the last printout you provided:

td_name = {0x69, 0x66, 0x5f, 0x69, 0x6f, 0x5f, 0x74, 0x71, 0x67, 0x5f, 0x
    0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0};

td_name = "if_io_tqg_1"

And according to the panic backtrace:

current process                = 0 (if_io_tqg_1)

So this is a different core-dump, probably different issue.

Can you repeat in the GDB printouts from the thread with sbdrop() in the backtrace (if_io_tqg_1):

I need to see:

print /x *tp
print /x *tp->t_inpcb

Just try searching all frames for these variables.

--HPS
Comment 53 Dobri Dobrev 2021-12-22 15:21:02 UTC
(In reply to Hans Petter Selasky from comment #52)

Hello,

The sbdrop() only appeared once on 1 of the servers, I've since then updated the kernel there, and have not extracted core dumps / etc from that server at all (there's barely any traffic on it, so it hasn't crashed again)


I've only provided further details from the 2nd server (where sbdrop() doesn't appear in "where")


Also, the dump matches the currently installed kernel. I've updated the source tree and build a new world/kernel, but have not installed it.


If you wish, I'll install the currently build kernel/world, wait for it to crash again and start extracting data from the dump?

Let me know.
Comment 54 Hans Petter Selasky freebsd_committer freebsd_triage 2021-12-22 15:27:56 UTC
Let's use the updated machine for now and follow Michael's thread.
Comment 55 Dobri Dobrev 2021-12-22 15:29:47 UTC
(In reply to Hans Petter Selasky from comment #54)

So, Do I hold off installing the kernel/world that I've build today, or install it, wait for a crash and extract new data?
Comment 56 Michael Tuexen freebsd_committer freebsd_triage 2021-12-22 15:33:23 UTC
Let us try to use a kernel build with INVARIANTS, let it crash and look at the core. Don't update the system while we are doing this. And let us focus on one system and one crash at a time.
Comment 57 Dobri Dobrev 2021-12-22 16:03:11 UTC
(In reply to Michael Tuexen from comment #56)

I can build the kernel with invariants, however...

1. During runtime - is there a noticeable slowdown or anything that would otherwise interfere with production traffic / etc ?

2. When it crashes - is the downtime longer (crash dump generation, etc) than normal?

Let me know.
Comment 58 Michael Tuexen freebsd_committer freebsd_triage 2021-12-22 16:24:44 UTC
(In reply to Dobri Dobrev from comment #57)
1. During runtime it is slower, since it does additional checking.
2. Downtime is the same.
Comment 59 Hans Petter Selasky freebsd_committer freebsd_triage 2021-12-22 16:38:49 UTC
Hi Dobri,

Can you confirm that SACK is enabled?

sysctl net.inet.tcp.sack.enable

--HPS
Comment 60 Dobri Dobrev 2021-12-22 17:16:28 UTC
(In reply to Hans Petter Selasky from comment #59)

SACK is enabled.

I'm rebuilding the latest available stable/13 kernel with invariants.
When it crashes - I'll start providing data.

Hopefully the slowdown doesn't interfere with production traffic, because if it does - I'll have to remove the invariants.
Comment 61 Michael Tuexen freebsd_committer freebsd_triage 2021-12-22 18:27:43 UTC
(In reply to Dobri Dobrev from comment #60)
Thats OK. Lets see what happens.
Comment 62 Dobri Dobrev 2021-12-22 18:34:46 UTC
(In reply to Michael Tuexen from comment #61)

Running stable/13-n248688-ecb7f44be90, waiting for a crash.
So far the invariants don't cause issues.
Comment 63 Hans Petter Selasky freebsd_committer freebsd_triage 2021-12-22 19:14:54 UTC
It could also be interesting to see the socket state, especially so_snd:

Can you try this:

frame 10
print /x *(tp->t_inpcb->inp_socket)

--HPS
Comment 64 Dobri Dobrev 2021-12-22 19:16:56 UTC
(In reply to Hans Petter Selasky from comment #63)

I already installed the new kernel and cleaned all the old dumps (there wasn't much space left in /).

I'll extract everything again after the next crash.
Comment 65 Dobri Dobrev 2021-12-22 22:24:57 UTC
(In reply to Hans Petter Selasky from comment #63)

So, here it is - I believe this is what we're looking for: "panic: tcp_m_copym, length > size of mbuf chain"

Unread portion of the kernel message buffer:
[12282] panic: tcp_m_copym, length > size of mbuf chain
[12282] cpuid = 1
[12282] time = 1640209960
[12282] KDB: stack backtrace:
[12282] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe017fd62550
[12282] vpanic() at vpanic+0x17f/frame 0xfffffe017fd625a0
[12282] panic() at panic+0x43/frame 0xfffffe017fd62600
[12282] tcp_m_copym() at tcp_m_copym+0x41b/frame 0xfffffe017fd626b0
[12282] tcp_output() at tcp_output+0x1433/frame 0xfffffe017fd62890
[12282] tcp_do_segment() at tcp_do_segment+0x2b9a/frame 0xfffffe017fd62960
[12282] tcp_input_with_port() at tcp_input_with_port+0xb7d/frame 0xfffffe017fd62aa0
[12282] tcp_input() at tcp_input+0xb/frame 0xfffffe017fd62ab0
[12282] ip_input() at ip_input+0x192/frame 0xfffffe017fd62b40
[12282] netisr_dispatch_src() at netisr_dispatch_src+0xaf/frame 0xfffffe017fd62ba0
[12282] ether_demux() at ether_demux+0x16e/frame 0xfffffe017fd62bd0
[12282] ether_nh_input() at ether_nh_input+0x3f8/frame 0xfffffe017fd62c30
[12282] netisr_dispatch_src() at netisr_dispatch_src+0xaf/frame 0xfffffe017fd62c90
[12282] ether_input() at ether_input+0x99/frame 0xfffffe017fd62cf0
[12282] iflib_rxeof() at iflib_rxeof+0xe07/frame 0xfffffe017fd62e00
[12282] _task_fn_rx() at _task_fn_rx+0x7a/frame 0xfffffe017fd62e40
[12282] gtaskqueue_run_locked() at gtaskqueue_run_locked+0xa7/frame 0xfffffe017fd62ec0
[12282] gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc2/frame 0xfffffe017fd62ef0
[12282] fork_exit() at fork_exit+0x80/frame 0xfffffe017fd62f30
[12282] fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe017fd62f30
[12282] --- trap 0, rip = 0x266300000000000, rsp = 0, rbp = 0 ---
[12282] KDB: enter: panic

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) where
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=textdump@entry=0) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xffffffff804c30fa in db_dump (dummy=<optimized out>, dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>) at /usr/src/sys/ddb/db_command.c:575
#3  0xffffffff804c2fb2 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=dopager@entry=1) at /usr/src/sys/ddb/db_command.c:482
#4  0xffffffff804c2c0d in db_command_loop () at /usr/src/sys/ddb/db_command.c:535
#5  0xffffffff804c60b6 in db_trap (type=<optimized out>, code=<optimized out>) at /usr/src/sys/ddb/db_main.c:270
#6  0xffffffff80c7a676 in kdb_trap (type=type@entry=3, code=code@entry=0, tf=tf@entry=0xfffffe017fd62480) at /usr/src/sys/kern/subr_kdb.c:733
#7  0xffffffff810ebd19 in trap (frame=0xfffffe017fd62480) at /usr/src/sys/amd64/amd64/trap.c:607
#8  <signal handler called>
#9  kdb_enter (why=0xffffffff812e57c1 "panic", msg=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:506
#10 0xffffffff80c2c900 in vpanic (fmt=0xffffffff811c2a3b "tcp_m_copym, length > size of mbuf chain", ap=ap@entry=0xfffffe017fd625e0) at /usr/src/sys/kern/kern_shutdown.c:908
#11 0xffffffff80c2c693 in panic (fmt=0xffffffff81e9d040 <cnputs_mtx> "\302&*\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:844
#12 0xffffffff80e11a3b in tcp_m_copym (m=0x0, m@entry=0xfffff80bc680b500, off0=1388, plen=<optimized out>, plen@entry=0xfffffe017fd6282c, seglimit=1, seglimit@entry=0, segsize=segsize@entry=0, sb=<optimized out>, 
    hw_tls=<optimized out>) at /usr/src/sys/netinet/tcp_output.c:2011
#13 0xffffffff80e0f893 in tcp_output (tp=<optimized out>) at /usr/src/sys/netinet/tcp_output.c:1091
#14 0xffffffff80e0607a in tcp_do_segment (m=<optimized out>, th=0xfffff80bc659e87a, so=<optimized out>, tp=0xfffffe0252e24000, drop_hdrlen=40, tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2822
#15 0xffffffff80e025bd in tcp_input_with_port (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>, port=port@entry=0) at /usr/src/sys/netinet/tcp_input.c:1400
#16 0xffffffff80e0340b in tcp_input (mp=0xffffffff81e9d040 <cnputs_mtx>, offp=0x80, proto=-2127893703) at /usr/src/sys/netinet/tcp_input.c:1496
#17 0xffffffff80df3d22 in ip_input (m=0x0) at /usr/src/sys/netinet/ip_input.c:834
#18 0xffffffff80d76f4f in netisr_dispatch_src (proto=1, source=source@entry=0, m=0xfffff80bc659e800) at /usr/src/sys/net/netisr.c:1143
#19 0xffffffff80d7729f in netisr_dispatch (proto=2179584064, m=0xffffffff812aeb39) at /usr/src/sys/net/netisr.c:1234
#20 0xffffffff80d5961e in ether_demux (ifp=ifp@entry=0xfffff8010731e800, m=0x80) at /usr/src/sys/net/if_ethersubr.c:921
#21 0xffffffff80d5ac98 in ether_input_internal (ifp=0xfffff8010731e800, m=0x80) at /usr/src/sys/net/if_ethersubr.c:707
#22 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:737
#23 0xffffffff80d76f4f in netisr_dispatch_src (proto=proto@entry=5, source=source@entry=0, m=m@entry=0xfffff80bc659e800) at /usr/src/sys/net/netisr.c:1143
#24 0xffffffff80d7729f in netisr_dispatch (proto=2179584064, proto@entry=5, m=0xffffffff812aeb39, m@entry=0xfffff80bc659e800) at /usr/src/sys/net/netisr.c:1234
#25 0xffffffff80d59ae9 in ether_input (ifp=0xfffff8010731e800, m=0xfffff80bc659e800) at /usr/src/sys/net/if_ethersubr.c:828
#26 0xffffffff80d72cc7 in iflib_rxeof (rxq=<optimized out>, rxq@entry=0xfffffe017ff65340, budget=<optimized out>) at /usr/src/sys/net/iflib.c:3046
#27 0xffffffff80d6ca6a in _task_fn_rx (context=0xfffffe017ff65340) at /usr/src/sys/net/iflib.c:3989
#28 0xffffffff80c78927 in gtaskqueue_run_locked (queue=queue@entry=0xfffff80105860600) at /usr/src/sys/kern/subr_gtaskqueue.c:371
#29 0xffffffff80c78752 in gtaskqueue_thread_loop (arg=arg@entry=0xfffffe017fed5020) at /usr/src/sys/kern/subr_gtaskqueue.c:547
#30 0xffffffff80be4ce0 in fork_exit (callout=0xffffffff80c78690 <gtaskqueue_thread_loop>, arg=0xfffffe017fed5020, frame=0xfffffe017fd62f40) at /usr/src/sys/kern/kern_fork.c:1092
#31 <signal handler called>
#32 0x0266300000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0
(kgdb)


Let me know what you need from the dump.
Comment 66 Michael Tuexen freebsd_committer freebsd_triage 2021-12-22 23:24:16 UTC
That was fast...

Let's start with:
frame 12
print *(struct mbuf *)0xfffff80bc680b500
print *(int32_t *)0xfffffe017fd6282c

frame 14
print *th
print *tp
Comment 67 Dobri Dobrev 2021-12-22 23:26:53 UTC
(In reply to Michael Tuexen from comment #66)

(kgdb) frame 12
#12 0xffffffff80e11a3b in tcp_m_copym (m=0x0, m@entry=0xfffff80bc680b500, off0=1388, plen=<optimized out>, plen@entry=0xfffffe017fd6282c, seglimit=1, seglimit@entry=0, segsize=segsize@entry=0, sb=<optimized out>, 
    hw_tls=<optimized out>) at /usr/src/sys/netinet/tcp_output.c:2011
2011                            KASSERT(len == M_COPYALL,
(kgdb) print *(struct mbuf *)0xfffff80bc680b500
$1 = {{m_next = 0x0, m_slist = {sle_next = 0x0}, m_stailq = {stqe_next = 0x0}}, {m_nextpkt = 0x0, m_slistpkt = {sle_next = 0x0}, m_stailqpkt = {stqe_next = 0x0}}, 
  m_data = 0xfffff8017874f000 "O\320mg\276\022\364u\353\271\061\270tI\356\063\227/\030\204\032d\\\252\274\261`PҲ\271\232F\343-\304\372\307<\031u\212\260\061ߐ\264\306i\361Vj\212\314ϓM\031R\257G\b\246\233\227\233,D\335C\220\273\022\025\223\251\361\211\222e+0M)\201\233\034e'\222\203\242h\201\017w\026\065\365\242خ\f\225\350\313\311\364$\244\262\265\370\375\237\f\206\303\r\"6\266F6\377\352\270\036?\022\fJ\032'\225\203Q\332Fy*d\225\373", <incomplete sequence \303>, m_len = 1999, m_type = 1, m_flags = 1, {{{m_pkthdr = {{snd_tag = 0x0, rcvif = 0x0}, tags = {slh_first = 0x0}, len = 1297, flowid = 0, csum_flags = 0, fibnum = 0, numa_domain = 255 '\377', 
          rsstype = 0 '\000', {rcv_tstmp = 0, {l2hlen = 0 '\000', l3hlen = 0 '\000', l4hlen = 0 '\000', l5hlen = 0 '\000', inner_l2hlen = 0 '\000', inner_l3hlen = 0 '\000', inner_l4hlen = 0 '\000', 
              inner_l5hlen = 0 '\000'}}, PH_per = {eight = "\000\000\000\000\377\377\000", sixteen = {0, 0, 65535, 0}, thirtytwo = {0, 65535}, sixtyfour = {281470681743360}, unintptr = {281470681743360}, 
            ptr = 0xffff00000000}, PH_loc = {eight = "\000\000\000\000\000\000\000", sixteen = {0, 0, 0, 0}, thirtytwo = {0, 0}, sixtyfour = {0}, unintptr = {0}, ptr = 0x0}}, {m_epg_npgs = 0 '\000', m_epg_nrdy = 0 '\000', 
          m_epg_hdrlen = 0 '\000', m_epg_trllen = 0 '\000', m_epg_1st_off = 0, m_epg_last_len = 0, m_epg_flags = 0 '\000', m_epg_record_type = 0 '\000', __spare = "\000", m_epg_enc_cnt = 0, m_epg_tls = 0x511, 
          m_epg_so = 0xff000000000000, m_epg_seqno = 0, m_epg_stailq = {stqe_next = 0xffff00000000}}}, {m_ext = {{ext_count = 2, ext_cnt = 0xdeadc0de00000002}, ext_size = 2048, ext_type = 6, ext_flags = 1, {{
              ext_buf = 0xfffff8017874f000 "O\320mg\276\022\364u\353\271\061\270tI\356\063\227/\030\204\032d\\\252\274\261`PҲ\271\232F\343-\304\372\307<\031u\212\260\061ߐ\264\306i\361Vj\212\314ϓM\031R\257G\b\246\233\227\233,D\335C\220\273\022\025\223\251\361\211\222e+0M)\201\233\034e'\222\203\242h\201\017w\026\065\365\242خ\f\225\350\313\311\364$\244\262\265\370\375\237\f\206\303\r\"6\266F6\377\352\270\036?\022\fJ\032'\225\203Q\332Fy*d\225\373", <incomplete sequence \303>, ext_arg2 = 0x0}, {extpg_pa = {18446735283932426240, 0, 16045693110842147038, 16045693110842147038, 16045693110842147038}, 
              extpg_trail = "\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255", <incomplete sequence \336>, extpg_hdr = "\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255"}}, ext_free = 0x0, 
          ext_arg1 = 0x0}, m_pktdat = 0xfffff80bc680b558 "\002"}}, m_dat = 0xfffff80bc680b520 ""}}
(kgdb) print *(int32_t *)0xfffffe017fd6282c
$2 = 612
(kgdb) frame 14
#14 0xffffffff80e0607a in tcp_do_segment (m=<optimized out>, th=0xfffff80bc659e87a, so=<optimized out>, tp=0xfffffe0252e24000, drop_hdrlen=40, tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2822
2822                                                    tcp_sack_partialack(tp, th);
(kgdb) print *th
$3 = {th_sport = 43204, th_dport = 63248, th_seq = 2812027976, th_ack = 324807354, th_x2 = 0 '\000', th_off = 5 '\005', th_flags = 16 '\020', th_win = 16103, th_sum = 0, th_urp = 0}
(kgdb) print *tp
$4 = {t_inpcb = 0xfffff8090099b1f0, t_fb = 0xffffffff81b414a0 <tcp_def_funcblk>, t_fb_ptr = 0x0, t_maxseg = 1400, t_logstate = 0, t_port = 0, t_state = 6, t_idle_reduce = 0, t_delayed_ack = 0, t_fin_is_rst = 0, 
  t_log_state_set = 0, bits_spare = 0, t_flags = 554697333, snd_una = 324805966, snd_max = 324807967, snd_nxt = 324807967, snd_up = 324805966, snd_wnd = 65800, snd_cwnd = 1400, t_peakrate_thr = 0, ts_offset = 0, 
  rfbuf_ts = 12071754, rcv_numsacks = 0, t_tsomax = 65535, t_tsomaxsegcount = 37, t_tsomaxsegsize = 4096, rcv_nxt = 2812027976, rcv_adv = 2812093832, rcv_wnd = 65856, t_flags2 = 1024, t_srtt = 7549, t_rttvar = 947, 
  ts_recent = 0, snd_scale = 2 '\002', rcv_scale = 6 '\006', snd_limited = 0 '\000', request_r_scale = 6 '\006', last_ack_sent = 2812027976, t_rcvtime = 2159165013, rcv_up = 2812027976, t_segqlen = 0, t_segqmbuflen = 0, 
  t_segq = {tqh_first = 0x0, tqh_last = 0xfffffe0252e24090}, t_in_pkt = 0x0, t_tail_pkt = 0x0, t_timers = 0xfffffe0252e242a8, t_vnet = 0xfffff8010582fec0, snd_ssthresh = 2800, snd_wl1 = 2812027976, snd_wl2 = 324805966, 
  irs = 2812024397, iss = 324701574, t_acktime = 0, t_sndtime = 2159073224, ts_recent_age = 0, snd_recover = 324807967, cl4_spare = 0, t_oobflags = 0 '\000', t_iobc = 0 '\000', t_rxtcur = 64000, t_rxtshift = 8, 
  t_rtttime = 0, t_rtseq = 324807965, t_starttime = 2158904990, t_fbyte_in = 2158905017, t_fbyte_out = 2158905018, t_pmtud_saved_maxseg = 0, t_blackhole_enter = 0, t_blackhole_exit = 0, t_rttmin = 30, t_rttbest = 7842, 
  t_softerror = 0, max_sndwnd = 65800, snd_cwnd_prev = 5600, snd_ssthresh_prev = 2800, snd_recover_prev = 324776566, t_sndzerowin = 0, t_rttupdated = 15, snd_numholes = 1, t_badrxtwin = 2158964144, snd_holes = {
    tqh_first = 0xfffff806d01890a0, tqh_last = 0xfffff806d01890b0}, snd_fack = 324807354, sackblks = {{start = 2812027975, end = 2812027976}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, 
      end = 0}, {start = 0, end = 0}}, sackhint = {nexthole = 0xfffff806d01890a0, sack_bytes_rexmit = 0, last_sack_ack = 324807354, delivered_data = 1388, sacked_bytes = 611, recover_fs = 3400, prr_delivered = 6800, 
    prr_out = 7588}, t_rttlow = 190, rfbuf_cnt = 0, tod = 0x0, t_sndrexmitpack = 47, t_rcvoopack = 0, t_toe = 0x0, cc_algo = 0xffffffff81b3e350 <newreno_cc_algo>, ccv = 0xfffffe0252e243f0, osd = 0xfffffe0252e24418, 
  t_bytes_acked = 0, t_maxunacktime = 0, t_keepinit = 0, t_keepidle = 0, t_keepintvl = 0, t_keepcnt = 0, t_dupacks = 0, t_lognum = 0, t_loglimit = 5000, t_pacing_rate = -1, t_logs = {stqh_first = 0x0, 
    stqh_last = 0xfffffe0252e24218}, t_lin = 0x0, t_lib = 0x0, t_output_caller = 0x0, t_stats = 0x0, t_logsn = 0, gput_ts = 0, gput_seq = 0, gput_ack = 0, t_stats_gput_prev = 0, t_maxpeakrate = 0, t_sndtlppack = 0, 
  t_sndtlpbyte = 0, t_sndbytes = 125990, t_snd_rxt_bytes = 40040, t_tfo_client_cookie_len = 0 '\000', t_end_info_status = 0, t_tfo_pending = 0x0, t_tfo_cookie = {client = '\000' <repeats 15 times>, server = 0}, {
    t_end_info_bytes = "\000\000\000\000\000\000\000", t_end_info = 0}}
(kgdb)
Comment 68 Michael Tuexen freebsd_committer freebsd_triage 2021-12-22 23:40:00 UTC
Thanks. Need to think when I'm more awake than now...
Comment 69 Dobri Dobrev 2021-12-22 23:40:46 UTC
(In reply to Michael Tuexen from comment #68)

Same. Let's continue tomorrow.
Comment 70 Hans Petter Selasky freebsd_committer freebsd_triage 2021-12-23 10:33:47 UTC
Could you also get:

frame 14
print /x *(tp->t_inpcb)
print /x *(tp->t_inpcb->inp_socket)

--HPS
Comment 71 Dobri Dobrev 2021-12-23 12:02:27 UTC
(In reply to Hans Petter Selasky from comment #70)

(kgdb) frame 14
#14 0xffffffff80e0607a in tcp_do_segment (m=<optimized out>, th=0xfffff80bc659e87a, so=<optimized out>, tp=0xfffffe0252e24000, drop_hdrlen=40, tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2822
2822                                                    tcp_sack_partialack(tp, th);
(kgdb) print /x *(tp->t_inpcb)
$5 = {inp_hash = {cle_next = 0x0, cle_prev = 0xfffffe020ae2fe18}, inp_pcbgrouphash = {cle_next = 0x0, cle_prev = 0x0}, inp_lock = {lock_object = {lo_name = 0xffffffff811d9a83, lo_flags = 0x56b0000, lo_data = 0x0, 
      lo_witness = 0xfffff8207fd75100}, rw_lock = 0xfffffe017fed7720}, inp_hpts = {tqe_next = 0x0, tqe_prev = 0x0}, inp_hpts_request = 0x0, inp_in_hpts = 0x0, inp_in_input = 0x0, inp_hpts_cpu = 0x0, inp_irq_cpu = 0x0, 
  inp_refcount = 0x2, inp_flags = 0x8802000, inp_flags2 = 0x0, inp_input_cpu = 0x0, inp_hpts_cpu_set = 0x0, inp_input_cpu_set = 0x0, inp_hpts_calls = 0x0, inp_input_calls = 0x0, inp_irq_cpu_set = 0x0, inp_spare_bits2 = 0x0, 
  inp_numa_domain = 0xff, inp_ppcb = 0xfffffe0252e24000, inp_socket = 0xfffff80900858000, inp_hptsslot = 0x0, inp_hpts_drop_reas = 0x0, inp_input = {tqe_next = 0x0, tqe_prev = 0x0}, inp_pcbinfo = 0xfffffe00d856f758, 
  inp_pcbgroup = 0x0, inp_pcbgroup_wild = {cle_next = 0x0, cle_prev = 0x0}, inp_cred = 0xfffff80107538500, inp_flow = 0x0, inp_vflag = 0x1, inp_ip_ttl = 0x40, inp_ip_p = 0x0, inp_ip_minttl = 0x0, inp_flowid = 0x5e457bf3, 
  inp_snd_tag = 0x0, inp_flowtype = 0x82, inp_rss_listen_bucket = 0x0, inp_inc = {inc_flags = 0x0, inc_len = 0x0, inc_fibnum = 0x1, inc_ie = {ie_fport = 0xa8c4, ie_lport = 0xf710, ie_dependfaddr = {id46_addr = {
          ia46_pad32 = {0x0, 0x0, 0x0}, ia46_addr4 = {s_addr = 0x2f2912b5}}, id6_addr = {__u6_addr = {__u6_addr8 = {0x0 <repeats 12 times>, 0xb5, 0x12, 0x29, 0x2f}, __u6_addr16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x12b5, 
              0x2f29}, __u6_addr32 = {0x0, 0x0, 0x0, 0x2f2912b5}}}}, ie_dependladdr = {id46_addr = {ia46_pad32 = {0x0, 0x0, 0x0}, ia46_addr4 = {s_addr = 0xd011ca95}}, id6_addr = {__u6_addr = {__u6_addr8 = {
              0x0 <repeats 12 times>, 0x95, 0xca, 0x11, 0xd0}, __u6_addr16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xca95, 0xd011}, __u6_addr32 = {0x0, 0x0, 0x0, 0xd011ca95}}}}, ie6_zoneid = 0x0}}, inp_label = 0x0, 
  inp_sp = 0xfffff80c371a4160, {inp_ip_tos = 0x0, inp_options = 0x0, inp_moptions = 0x0}, {in6p_options = 0x0, in6p_outputopts = 0x0, in6p_moptions = 0x0, in6p_icmp6filt = 0x0, in6p_cksum = 0x0, in6p_hops = 0x0}, 
  inp_portlist = {cle_next = 0xfffff80c25133d90, cle_prev = 0xfffff80c2574e160}, inp_phd = 0xfffff80105bbbf00, inp_gencnt = 0xa07c7c, spare_ptr = 0x0, inp_rt_cookie = 0x63, {inp_route = {ro_nh = 0xfffff8016f136d00, 
      ro_lle = 0xfffff8013c8a2a80, ro_prepend = 0x0, ro_plen = 0x0, ro_flags = 0x180, ro_mtu = 0x0, spare = 0x0, ro_dst = {sa_len = 0x10, sa_family = 0x2, sa_data = {0x0, 0x0, 0xb5, 0x12, 0x29, 0x2f, 0x0, 0x0, 0x0, 0x0, 
          0x0, 0x0, 0x0, 0x0}}}, inp_route6 = {ro_nh = 0xfffff8016f136d00, ro_lle = 0xfffff8013c8a2a80, ro_prepend = 0x0, ro_plen = 0x0, ro_flags = 0x180, ro_mtu = 0x0, spare = 0x0, ro_dst = {sin6_len = 0x10, 
        sin6_family = 0x2, sin6_port = 0x0, sin6_flowinfo = 0x2f2912b5, sin6_addr = {__u6_addr = {__u6_addr8 = {0x0 <repeats 16 times>}, __u6_addr16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, __u6_addr32 = {0x0, 0x0, 0x0, 
              0x0}}}, sin6_scope_id = 0x0}}}, inp_list = {cle_next = 0xfffff80c2580e000, cle_prev = 0xfffff80c2574e1d0}, inp_epoch_ctx = {data = {0x0, 0x0}}}
(kgdb) print /x *(tp->t_inpcb->inp_socket)
$6 = {so_lock = {lock_object = {lo_name = 0xffffffff81203282, lo_flags = 0x1430000, lo_data = 0x0, lo_witness = 0xfffff8207fd84300}, mtx_lock = 0x0}, so_count = 0x0, so_rdsel = {si_tdlist = {tqh_first = 0x0, 
      tqh_last = 0x0}, si_note = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff80cd4440, kl_unlock = 0xffffffff80cd4480, kl_assert_lock = 0xffffffff80cd44c0, kl_lockarg = 0xfffff80900858000, kl_autodestroy = 0x0}, 
    si_mtx = 0x0}, so_wrsel = {si_tdlist = {tqh_first = 0x0, tqh_last = 0x0}, si_note = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff80cd4550, kl_unlock = 0xffffffff80cd4590, kl_assert_lock = 0xffffffff80cd45d0, 
      kl_lockarg = 0xfffff80900858000, kl_autodestroy = 0x0}, si_mtx = 0x0}, so_type = 0x1, so_options = 0x10004, so_linger = 0x0, so_state = 0x410b, so_pcb = 0xfffff8090099b1f0, so_vnet = 0xfffff8010582fec0, 
  so_proto = 0xffffffff81b3be40, so_timeo = 0x0, so_error = 0x0, so_rerror = 0x0, so_sigio = 0x0, so_cred = 0xfffff80107538500, so_label = 0x0, so_gencnt = 0xa23ff6, so_emuldata = 0x0, so_dtor = 0x0, osd = {
    osd_nslots = 0x0, osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}}, so_fibnum = 0x1, so_user_cookie = 0x0, so_ts_clock = 0x0, so_max_pacing_rate = 0x0, {{so_rcv = {sb_mtx = {lock_object = {
            lo_name = 0xffffffff8127fc0a, lo_flags = 0x1030000, lo_data = 0x0, lo_witness = 0xfffff8207fd74800}, mtx_lock = 0x0}, sb_sx = {lock_object = {lo_name = 0xffffffff812debe2, lo_flags = 0x2330000, lo_data = 0x0, 
            lo_witness = 0xfffff8207fd84400}, sx_lock = 0x1}, sb_sel = 0xfffff80900858028, sb_state = 0x20, sb_mb = 0x0, sb_mbtail = 0x0, sb_lastrecord = 0x0, sb_sndptr = 0x0, sb_fnrdy = 0x0, sb_sndptroff = 0x0, 
        sb_acc = 0x0, sb_ccc = 0x0, sb_hiwat = 0x10108, sb_mbcnt = 0x0, sb_mcnt = 0x0, sb_ccnt = 0x0, sb_mbmax = 0x80840, sb_ctl = 0x0, sb_tlscc = 0x0, sb_tlsdcc = 0x0, sb_lowat = 0x1, sb_timeo = 0x0, sb_tls_seqno = 0x0, 
        sb_tls_info = 0x0, sb_mtls = 0x0, sb_mtlstail = 0x0, sb_flags = 0x800, sb_upcall = 0x0, sb_upcallarg = 0x0, sb_aiojobq = {tqh_first = 0x0, tqh_last = 0xfffff80900858230}, sb_aiotask = {ta_link = {stqe_next = 0x0}, 
          ta_pending = 0x0, ta_priority = 0x0, ta_flags = 0x0, ta_func = 0xffffffff80caceb0, ta_context = 0xfffff80900858000}}, so_snd = {sb_mtx = {lock_object = {lo_name = 0xffffffff81296bb1, lo_flags = 0x1030000, 
            lo_data = 0x0, lo_witness = 0xfffff8207fd74780}, mtx_lock = 0xfffffe017fed7720}, sb_sx = {lock_object = {lo_name = 0xffffffff8130e57d, lo_flags = 0x2330000, lo_data = 0x0, lo_witness = 0xfffff8207fd84380}, 
          sx_lock = 0x1}, sb_sel = 0xfffff80900858070, sb_state = 0x10, sb_mb = 0xfffff80bc680b500, sb_mbtail = 0xfffff80bc680b500, sb_lastrecord = 0xfffff80bc680b500, sb_sndptr = 0xfffff80bc680b500, sb_fnrdy = 0x0, 
        sb_sndptroff = 0x0, sb_acc = 0x7cf, sb_ccc = 0x7cf, sb_hiwat = 0x10108, sb_mbcnt = 0x900, sb_mcnt = 0x1, sb_ccnt = 0x1, sb_mbmax = 0x80840, sb_ctl = 0x0, sb_tlscc = 0x0, sb_tlsdcc = 0x0, sb_lowat = 0x800, 
        sb_timeo = 0x0, sb_tls_seqno = 0x0, sb_tls_info = 0x0, sb_mtls = 0x0, sb_mtlstail = 0x0, sb_flags = 0x800, sb_upcall = 0x0, sb_upcallarg = 0x0, sb_aiojobq = {tqh_first = 0x0, tqh_last = 0xfffff80900858348}, 
        sb_aiotask = {ta_link = {stqe_next = 0x0}, ta_pending = 0x0, ta_priority = 0x0, ta_flags = 0x0, ta_func = 0xffffffff80cad6f0, ta_context = 0xfffff80900858000}}, so_list = {tqe_next = 0xffffffffffffffff, 
        tqe_prev = 0xffffffffffffffff}, so_listen = 0x0, so_qstate = 0x0, so_peerlabel = 0x0, so_oobmark = 0x0, so_ktls_rx_list = {stqe_next = 0x0}}, {sol_incomp = {tqh_first = 0xffffffff8127fc0a, tqh_last = 0x1030000}, 
      sol_comp = {tqh_first = 0xfffff8207fd74800, tqh_last = 0x0}, sol_qlen = 0x812debe2, sol_incqlen = 0xffffffff, sol_qlimit = 0x2330000, sol_accept_filter = 0xfffff8207fd84400, sol_accept_filter_arg = 0x1, 
      sol_accept_filter_str = 0xfffff80900858028, sol_upcall = 0x20, sol_upcallarg = 0x0, sol_sbrcv_lowat = 0x0, sol_sbsnd_lowat = 0x0, sol_sbrcv_hiwat = 0x0, sol_sbsnd_hiwat = 0x0, sol_sbrcv_flags = 0x0, 
      sol_sbsnd_flags = 0x0, sol_sbrcv_timeo = 0x0, sol_sbsnd_timeo = 0x0, sol_lastover = {tv_sec = 0x1010800000000, tv_usec = 0x0}, sol_overcount = 0x0}}}
(kgdb)
Comment 72 Hans Petter Selasky freebsd_committer freebsd_triage 2021-12-23 14:00:03 UTC
And also:

frame 14
print /x *(tp->t_inpcb->inp_socket->so_snd.sb_mb)

This dumps the faulty mbuf. I see that so_snd reports bytes available, let's see if that matches the mbuf:

sb_acc = 0x7cf, sb_ccc = 0x7cf, sb_mcnt = 0x1, 

--HPS
Comment 73 Dobri Dobrev 2021-12-23 15:04:20 UTC
(In reply to Hans Petter Selasky from comment #72)

(kgdb) frame 14
#14 0xffffffff80e0607a in tcp_do_segment (m=<optimized out>, th=0xfffff80bc659e87a, so=<optimized out>, tp=0xfffffe0252e24000, drop_hdrlen=40, tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2822
2822                                                    tcp_sack_partialack(tp, th);
(kgdb) print /x *(tp->t_inpcb->inp_socket->so_snd.sb_mb)
$7 = {{m_next = 0x0, m_slist = {sle_next = 0x0}, m_stailq = {stqe_next = 0x0}}, {m_nextpkt = 0x0, m_slistpkt = {sle_next = 0x0}, m_stailqpkt = {stqe_next = 0x0}}, m_data = 0xfffff8017874f000, m_len = 0x7cf, m_type = 0x1, 
  m_flags = 0x1, {{{m_pkthdr = {{snd_tag = 0x0, rcvif = 0x0}, tags = {slh_first = 0x0}, len = 0x511, flowid = 0x0, csum_flags = 0x0, fibnum = 0x0, numa_domain = 0xff, rsstype = 0x0, {rcv_tstmp = 0x0, {l2hlen = 0x0, 
              l3hlen = 0x0, l4hlen = 0x0, l5hlen = 0x0, inner_l2hlen = 0x0, inner_l3hlen = 0x0, inner_l4hlen = 0x0, inner_l5hlen = 0x0}}, PH_per = {eight = {0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0x0, 0x0}, sixteen = {0x0, 0x0, 
              0xffff, 0x0}, thirtytwo = {0x0, 0xffff}, sixtyfour = {0xffff00000000}, unintptr = {0xffff00000000}, ptr = 0xffff00000000}, PH_loc = {eight = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, sixteen = {0x0, 0x0, 0x0, 
              0x0}, thirtytwo = {0x0, 0x0}, sixtyfour = {0x0}, unintptr = {0x0}, ptr = 0x0}}, {m_epg_npgs = 0x0, m_epg_nrdy = 0x0, m_epg_hdrlen = 0x0, m_epg_trllen = 0x0, m_epg_1st_off = 0x0, m_epg_last_len = 0x0, 
          m_epg_flags = 0x0, m_epg_record_type = 0x0, __spare = {0x0, 0x0}, m_epg_enc_cnt = 0x0, m_epg_tls = 0x511, m_epg_so = 0xff000000000000, m_epg_seqno = 0x0, m_epg_stailq = {stqe_next = 0xffff00000000}}}, {m_ext = {{
            ext_count = 0x2, ext_cnt = 0xdeadc0de00000002}, ext_size = 0x800, ext_type = 0x6, ext_flags = 0x1, {{ext_buf = 0xfffff8017874f000, ext_arg2 = 0x0}, {extpg_pa = {0xfffff8017874f000, 0x0, 0xdeadc0dedeadc0de, 
                0xdeadc0dedeadc0de, 0xdeadc0dedeadc0de}, extpg_trail = {0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 
                0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 
                0xde, 0xde, 0xc0, 0xad, 0xde}, extpg_hdr = {0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad}}}, ext_free = 0x0, 
          ext_arg1 = 0x0}, m_pktdat = 0xfffff80bc680b558}}, m_dat = 0xfffff80bc680b520}}
(kgdb)
Comment 74 Hans Petter Selasky freebsd_committer freebsd_triage 2021-12-23 17:03:20 UTC
And this one:

print /x *(int *)0xfffffe017fd6282c

--HPS
Comment 75 Dobri Dobrev 2021-12-23 17:04:32 UTC
(In reply to Hans Petter Selasky from comment #74)

From frame 14:

(kgdb) print /x *(int *)0xfffffe017fd6282c
$8 = 0x264
(kgdb)
Comment 76 Dobri Dobrev 2021-12-24 13:56:37 UTC
Anything more from the dump, or, potential fix to test on the server?
Comment 77 Michael Tuexen freebsd_committer freebsd_triage 2021-12-24 14:28:37 UTC
(In reply to Dobri Dobrev from comment #76)
Not right now from my side. Will setup a local test system an explore a couple of packetflows...
Comment 78 Dobri Dobrev 2022-01-10 18:30:06 UTC
(In reply to Michael Tuexen from comment #77)

Any news?
Comment 79 Michael Tuexen freebsd_committer freebsd_triage 2022-01-10 18:52:41 UTC
(In reply to Dobri Dobrev from comment #78)
Not yet. I'll bring it up at the transport telco on next Thursday.
Comment 80 Dobri Dobrev 2022-01-10 18:56:29 UTC
(In reply to Michael Tuexen from comment #79)

Btw, I moved over to -release, no issues there.
Comment 81 Hans Petter Selasky freebsd_committer freebsd_triage 2022-01-12 20:49:08 UTC
Are you loading any modules from /boot/modules, typically installed from ports, which access the network? Just curious.
Comment 82 Dobri Dobrev 2022-01-12 20:53:25 UTC
(In reply to Hans Petter Selasky from comment #81)

Here are the modules loaded on both Stable and Release:

13-Stable:
Id Refs Address                Size Name
 1   36 0xffffffff80200000  1f2a788 kernel
 2    3 0xffffffff8212b000     3cb0 smbus.ko
 3    1 0xffffffff82130000     2870 accf_data.ko
 4    1 0xffffffff82133000     2e88 accf_http.ko
 5    1 0xffffffff82136000     ab70 opensolaris.ko
 6    1 0xffffffff82141000    11578 ipmi.ko
 7    1 0xffffffff82153000    734d0 pf.ko
 8    1 0xffffffff821c7000    77650 if_igb.ko
 9    1 0xffffffff826e5000     3378 acpi_wmi.ko
10    1 0xffffffff826e9000     3218 intpm.ko
11    1 0xffffffff826ed000     64b8 if_gre.ko
12    1 0xffffffff826f4000     3530 fdescfs.ko

13-Release:
Id Refs Address                Size Name
 1   36 0xffffffff80200000  1f11c18 kernel
 2    3 0xffffffff82112000     44f0 smbus.ko
 3    1 0xffffffff82117000    12a80 ipmi.ko
 4    1 0xffffffff8212a000    79c70 if_igb.ko
 5    1 0xffffffff821a4000     b7b8 opensolaris.ko
 6    1 0xffffffff821b0000     32b0 accf_http.ko
 7    1 0xffffffff821b4000    5c3b0 pf.ko
 8    1 0xffffffff82211000     2b58 accf_data.ko
 9    1 0xffffffff826e5000     3378 acpi_wmi.ko
10    1 0xffffffff826e9000     3218 intpm.ko
11    1 0xffffffff826ed000     64b8 if_gre.ko
12    1 0xffffffff826f4000     3530 fdescfs.ko
Comment 83 Gleb Smirnoff freebsd_committer freebsd_triage 2022-01-13 17:09:08 UTC
Dobri,

kernel crash minidumps (this is what we do by default) will not have any data for userland pages. So, no proprietary binaries running would be leaked, if you share the core. What could leak is the data that was flowing through the network stack at the moment of crash. Maybe sharing cores with Michael will make things faster.
Comment 84 Dobri Dobrev 2022-01-14 08:01:40 UTC
(In reply to Gleb Smirnoff from comment #83)

From what I remember the crash dump files were ~3.5-3.6 GB. Does that correspond to a minidump crash size?
Comment 85 Dobri Dobrev 2022-01-14 08:15:25 UTC
(In reply to Gleb Smirnoff from comment #83)

Also, how can I check to confirm no actual binaries don't exist in the dump?
Comment 86 Gleb Smirnoff freebsd_committer freebsd_triage 2022-01-14 14:48:02 UTC
Yes, full dump would be exactly size of your physical RAM. There are two ways two check: search the dump file for a sample of the data you are concerned to leak, or open the dump in kgdb, find the process you are interested in, try to read pages that belong to it and make sure that kgdb fails to read them.
Comment 87 Dobri Dobrev 2022-01-15 06:49:50 UTC
(In reply to Gleb Smirnoff from comment #86)

The problem is ... I switched to -release. On -stable it crashed very frequently resulting in disruptions of customer traffic, which is something I'd like to avoid.

Right now I don't have a server running stable.

Were you not able to reproduce the issue on your setups?
Nginx with http/https traffic ~400-1200 rps throughout the day resulted in crashes varying from once every hour to once every 2-4 days.
Comment 88 Michael Tuexen freebsd_committer freebsd_triage 2022-01-15 22:31:25 UTC
(In reply to Dobri Dobrev from comment #87)
I tried to trigger a panic on a local system using packetdrill scripts, but were not able to to this. So without being able to reproduce this in a system I can access, it is hard to figure out what is going on...
Comment 89 Dobri Dobrev 2022-02-21 09:29:40 UTC
(In reply to Michael Tuexen from comment #88)

I don't suppose there's anything new regarding this issue?

I could do a separate 13-stable install, place some traffic and see if it'd crash again, however, I need to be *sure* the dump won't contain any running binaries, ssl cert-key pairs, etc.

Let me know how.
Comment 90 Michael Tuexen freebsd_committer freebsd_triage 2022-02-21 10:35:28 UTC
(In reply to Dobri Dobrev from comment #89)

I would suggest the following:

1. During this week I will MFC all TCP related changes to stable/13, which I think should go into 13.1. I'll ping you when it is done.

2. You update to that state and do your testing.

That way we can test what will be in 13.1. I'm interested that that version does not have the issue you were experiencing.

If the problem still persists, having access to the core does help. According the Gleb, no critical information should be in there. If you prefer, you can give me access to a machine under your control having the core file and you can monitor any command I'm running.

If the above two steps are fine with you, that would be great. Please let me know.
Comment 91 Michael Tuexen freebsd_committer freebsd_triage 2022-02-23 01:19:22 UTC
(In reply to Dobri Dobrev from comment #89)
OK, I MFCed all relevant changes I wanted to MFC. It would be great if you could update a machine to stable/13, test it, and report the outcome of the test.
Comment 92 Dobri Dobrev 2022-04-16 16:44:03 UTC
(In reply to Michael Tuexen from comment #91)

I hadn't had a chance to update to the latest stable/13.

Just got a crash after 112 days uptime on stable/13-n248590-b7da472979a

Here's what kgdb shows:

Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...

Unread portion of the kernel message buffer:
[9705874] panic: page fault
[9705874] cpuid = 5
[9705874] time = 1650110040
[9705874] KDB: stack backtrace:
[9705874] #0 0xffffffff80c60dd5 at kdb_backtrace+0x65
[9705874] #1 0xffffffff80c1336f at vpanic+0x17f
[9705874] #2 0xffffffff80c131e3 at panic+0x43
[9705874] #3 0xffffffff810991b5 at trap_fatal+0x385
[9705874] #4 0xffffffff8109920f at trap_pfault+0x4f
[9705874] #5 0xffffffff810705e8 at calltrap+0x8
[9705874] #6 0xffffffff80dd5fa9 at tcp_output+0x1339
[9705874] #7 0xffffffff80dcd382 at tcp_do_segment+0x2902
[9705874] #8 0xffffffff80dc9d41 at tcp_input_with_port+0xb61
[9705874] #9 0xffffffff80dca9eb at tcp_input+0xb
[9705874] #10 0xffffffff80dbc1bf at ip_input+0x11f
[9705874] #11 0xffffffff80d491a9 at netisr_dispatch_src+0xb9
[9705874] #12 0xffffffff80d2d128 at ether_demux+0x138
[9705874] #13 0xffffffff80d2e4b5 at ether_nh_input+0x355
[9705874] #14 0xffffffff80d491a9 at netisr_dispatch_src+0xb9
[9705874] #15 0xffffffff80d2d559 at ether_input+0x69
[9705874] #16 0xffffffff80d45617 at iflib_rxeof+0xc27
[9705874] #17 0xffffffff80d3fc62 at _task_fn_rx+0x72
[9705874] Uptime: 112d8h4m34s
[9705874] Dumping 11660 out of 65425 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55      __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) where
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xffffffff80c12f6c in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487
#3  0xffffffff80c133de in vpanic (fmt=0xffffffff81191bdd "%s", ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:920
#4  0xffffffff80c131e3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:844
#5  0xffffffff810991b5 in trap_fatal (frame=0xfffffe0069f535b0, eva=24) at /usr/src/sys/amd64/amd64/trap.c:944
#6  0xffffffff8109920f in trap_pfault (frame=0xfffffe0069f535b0, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:763
#7  <signal handler called>
#8  m_copydata (m=0x0, m@entry=0xfffff801e9cc5b00, off=0, len=1, cp=<optimized out>) at /usr/src/sys/kern/uipc_mbuf.c:657
#9  0xffffffff80dd5fa9 in tcp_output (tp=<optimized out>) at /usr/src/sys/netinet/tcp_output.c:1081
#10 0xffffffff80dcd382 in tcp_do_segment (m=<optimized out>, th=<optimized out>, so=<optimized out>, tp=0xfffffe01a0d0b870, drop_hdrlen=52, tlen=<optimized out>, 
    iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2822
#11 0xffffffff80dc9d41 in tcp_input_with_port (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>, port=port@entry=0) at /usr/src/sys/netinet/tcp_input.c:1400
#12 0xffffffff80dca9eb in tcp_input (mp=0xfffff801e9cc5b00, offp=0x0, proto=1) at /usr/src/sys/netinet/tcp_input.c:1496
#13 0xffffffff80dbc1bf in ip_input (m=0x0) at /usr/src/sys/netinet/ip_input.c:834
#14 0xffffffff80d491a9 in netisr_dispatch_src (proto=1, source=source@entry=0, m=0xfffff8002c41c400) at /usr/src/sys/net/netisr.c:1143
#15 0xffffffff80d4957f in netisr_dispatch (proto=3922483968, m=0x1) at /usr/src/sys/net/netisr.c:1234
#16 0xffffffff80d2d128 in ether_demux (ifp=ifp@entry=0xfffff80001ed8000, m=0x0) at /usr/src/sys/net/if_ethersubr.c:921
#17 0xffffffff80d2e4b5 in ether_input_internal (ifp=0xfffff80001ed8000, m=0x0) at /usr/src/sys/net/if_ethersubr.c:707
#18 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:737
#19 0xffffffff80d491a9 in netisr_dispatch_src (proto=proto@entry=5, source=source@entry=0, m=m@entry=0xfffff8002c41c400) at /usr/src/sys/net/netisr.c:1143
#20 0xffffffff80d4957f in netisr_dispatch (proto=3922483968, proto@entry=5, m=0x1, m@entry=0xfffff8002c41c400) at /usr/src/sys/net/netisr.c:1234
#21 0xffffffff80d2d559 in ether_input (ifp=<optimized out>, m=0xfffff8002c41c400) at /usr/src/sys/net/if_ethersubr.c:828
#22 0xffffffff80d45617 in iflib_rxeof (rxq=<optimized out>, rxq@entry=0xfffffe0114b00040, budget=<optimized out>) at /usr/src/sys/net/iflib.c:3046
#23 0xffffffff80d3fc62 in _task_fn_rx (context=0xfffffe0114b00040) at /usr/src/sys/net/iflib.c:3989
#24 0xffffffff80c5f80d in gtaskqueue_run_locked (queue=queue@entry=0xfffff80001d68800) at /usr/src/sys/kern/subr_gtaskqueue.c:371
#25 0xffffffff80c5f482 in gtaskqueue_thread_loop (arg=<optimized out>, arg@entry=0xfffffe0114a7b080) at /usr/src/sys/kern/subr_gtaskqueue.c:547
#26 0xffffffff80bd053e in fork_exit (callout=0xffffffff80c5f3c0 <gtaskqueue_thread_loop>, arg=0xfffffe0114a7b080, frame=0xfffffe0069f53f40)
    at /usr/src/sys/kern/kern_fork.c:1092
#27 <signal handler called>
#28 mi_startup () at /usr/src/sys/kern/init_main.c:322
Backtrace stopped: Cannot access memory at address 0x17
(kgdb)


Let me know if I should update to latest stable/13, or if you'd want to examine the crashdump the same way we did before - you tell me what you need, I do it and provide it here.

Regards,
D
Comment 93 Richard Scheffenegger freebsd_committer freebsd_triage 2022-06-10 22:18:09 UTC
The current thinking is, that SACK rescue retransmissions (in FBSD13 this is gated by net.inet.tcp.rfc6675_pipe=1) very rarely creates an entry, which apparently is beyond the valid data range. 

While under most common circumstances, a final FIN bit in the sequence space is taken care of, it seems that there may be some double-counting for the FIN bit.

In most of the inspected cores, we found:

TCP state: LAST_ACK (FIN received and also FIN sent)
SACK loss recovery triggered
A cumulative ACK before all outstanding data was received
The remote cliet "disappears" for a significant amount of time (7 to 12 retransmission timeouts), but may re-appear again just prior.
snd_max consistently 2 counts above the last data, instead of the expected 1 (for the FIN bit).

However, it is still unclear under what circumstances this double-counting happens, possibly when the persist timer triggers, and a few other conditions are also fulfilled - maybe a race condition between normal packet processing and a timer firing.

In short: disabling rfc6675 enhanced SACK features (more correct pipeline accounting, rescue retransmissions) should address the cause of the panic, while not addressing the root cause of when/why there is the double-accounting of the FIN bit...

Would you be willing to run an intrumented kernel, which either panics (full core dump), or spews out various state, when inconsistencies are detected in this space - while ignoring/addressing them "on the fly" without panicing?
Comment 94 Dobri Dobrev 2022-08-28 19:52:29 UTC
Just got a crash on 13.1 -- stable/13-n252201
And this is with net.inet.tcp.rfc6675_pipe=0

Here's kgdb:

# kgdb /boot/kernel/kernel /var/crash/vmcore.4 
GNU gdb (GDB) 11.1 [GDB v11.1 for FreeBSD]
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd13.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...

Unread portion of the kernel message buffer:
[91] frame pointer              = 0x28:0xfffffe0069f536e0
[91] code segment               = base rx0, limit 0xfffff, type 0x1b
[91]                    = DPL 0, pres 1, long 1, def32 0, gran 1
[92] processor eflags   = interrupt enabled, resume, IOPL = 0
[92] current process            = 0 (if_io_tqg_5)
[92] trap number                = 12
[92] panic: page fault
[92] cpuid = 5
[92] time = 1661715643
[92] KDB: stack backtrace:
[92] #0 0xffffffff80c50045 at kdb_backtrace+0x65
[92] #1 0xffffffff80c02e81 at vpanic+0x151
[92] #2 0xffffffff80c02d23 at panic+0x43
[92] #3 0xffffffff8109fd57 at trap_fatal+0x387
[92] #4 0xffffffff8109fdaf at trap_pfault+0x4f
[92] #5 0xffffffff81077288 at calltrap+0x8
[92] #6 0xffffffff80dc7699 at tcp_output+0x1339
[92] #7 0xffffffff80dbedab at tcp_do_segment+0x2c9b
[92] #8 0xffffffff80dbb3e1 at tcp_input_with_port+0xb61
[92] #9 0xffffffff80dbc07b at tcp_input+0xb
[92] #10 0xffffffff80dad8f8 at ip_input+0x118
[92] #11 0xffffffff80d3a729 at netisr_dispatch_src+0xb9
[92] #12 0xffffffff80d1e974 at ether_demux+0x144
[92] #13 0xffffffff80d1fcd6 at ether_nh_input+0x346
[92] #14 0xffffffff80d3a729 at netisr_dispatch_src+0xb9
[92] #15 0xffffffff80d1ed99 at ether_input+0x69
[92] #16 0xffffffff80d36c3b at iflib_rxeof+0xbcb
[92] #17 0xffffffff80d314c2 at _task_fn_rx+0x72
[92] Uptime: 1m32s
[92] Dumping 2355 out of 65425 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) where
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  dump_savectx () at /usr/src/sys/kern/kern_shutdown.c:394
#2  0xffffffff80c02a78 in dumpsys (di=0x0) at /usr/src/sys/x86/include/dump.h:87
#3  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:423
#4  kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:497
#5  0xffffffff80c02eee in vpanic (fmt=<optimized out>, ap=ap@entry=0xfffffe0069f534c0) at /usr/src/sys/kern/kern_shutdown.c:930
#6  0xffffffff80c02d23 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:854
#7  0xffffffff8109fd57 in trap_fatal (frame=0xfffffe0069f535b0, eva=24) at /usr/src/sys/amd64/amd64/trap.c:940
#8  0xffffffff8109fdaf in trap_pfault (frame=0xfffffe0069f535b0, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:759
#9  <signal handler called>
#10 m_copydata (m=0x0, m@entry=0xfffff8000dc30e00, off=0, len=1, cp=<optimized out>) at /usr/src/sys/kern/uipc_mbuf.c:659
#11 0xffffffff80dc7699 in tcp_output (tp=0xfffffe019e765950) at /usr/src/sys/netinet/tcp_output.c:1084
#12 0xffffffff80dbedab in tcp_do_segment (m=0xfffff8002ad7e100, th=0xfffff8002ad7e17a, so=0xfffff801cb635000, tp=0xfffffe019e765950, drop_hdrlen=64, tlen=<optimized out>, iptos=0 '\000')
    at /usr/src/sys/netinet/tcp_input.c:2822
#13 0xffffffff80dbb3e1 in tcp_input_with_port (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>, port=port@entry=0) at /usr/src/sys/netinet/tcp_input.c:1400
#14 0xffffffff80dbc07b in tcp_input (mp=0xfffff8000dc30e00, offp=0x0, proto=1) at /usr/src/sys/netinet/tcp_input.c:1496
#15 0xffffffff80dad8f8 in ip_input (m=0x0) at /usr/src/sys/netinet/ip_input.c:839
#16 0xffffffff80d3a729 in netisr_dispatch_src (proto=1, source=source@entry=0, m=0xfffff8002ad7e100) at /usr/src/sys/net/netisr.c:1143
#17 0xffffffff80d3aaff in netisr_dispatch (proto=230886912, m=0x1) at /usr/src/sys/net/netisr.c:1234
#18 0xffffffff80d1e974 in ether_demux (ifp=ifp@entry=0xfffff800023a6800, m=0x0) at /usr/src/sys/net/if_ethersubr.c:921
#19 0xffffffff80d1fcd6 in ether_input_internal (ifp=0xfffff800023a6800, m=0x0) at /usr/src/sys/net/if_ethersubr.c:707
#20 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:737
#21 0xffffffff80d3a729 in netisr_dispatch_src (proto=proto@entry=5, source=source@entry=0, m=m@entry=0xfffff8002ad7e100) at /usr/src/sys/net/netisr.c:1143
#22 0xffffffff80d3aaff in netisr_dispatch (proto=230886912, proto@entry=5, m=0x1, m@entry=0xfffff8002ad7e100) at /usr/src/sys/net/netisr.c:1234
#23 0xffffffff80d1ed99 in ether_input (ifp=<optimized out>, m=0xfffff8002ad7e100) at /usr/src/sys/net/if_ethersubr.c:828
#24 0xffffffff80d36c3b in iflib_rxeof (rxq=rxq@entry=0xfffffe0114b0f040, budget=<optimized out>) at /usr/src/sys/net/iflib.c:3046
#25 0xffffffff80d314c2 in _task_fn_rx (context=0xfffffe0114b0f040) at /usr/src/sys/net/iflib.c:3989
#26 0xffffffff80c4ea5d in gtaskqueue_run_locked (queue=queue@entry=0xfffff80001d6b800) at /usr/src/sys/kern/subr_gtaskqueue.c:371
#27 0xffffffff80c4e6c3 in gtaskqueue_thread_loop (arg=arg@entry=0xfffffe0114a7f080) at /usr/src/sys/kern/subr_gtaskqueue.c:547
#28 0xffffffff80bbfafe in fork_exit (callout=0xffffffff80c4e600 <gtaskqueue_thread_loop>, arg=0xfffffe0114a7f080, frame=0xfffffe0069f53f40) at /usr/src/sys/kern/kern_fork.c:1103
#29 <signal handler called>
#30 mi_startup () at /usr/src/sys/kern/init_main.c:322
Backtrace stopped: Cannot access memory at address 0x17
(kgdb)
Comment 95 commit-hook freebsd_committer freebsd_triage 2022-09-19 10:50:32 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=6d9e911fbadf3b409802a211c1dae9b47cb5a2b8

commit 6d9e911fbadf3b409802a211c1dae9b47cb5a2b8
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2022-09-19 10:42:43 +0000
Commit:     Michael Tuexen <tuexen@FreeBSD.org>
CommitDate: 2022-09-19 10:49:31 +0000

    tcp: fix computation of offset

    Only update the offset if actually retransmitting from the
    scoreboard. If not done correctly, this may result in
    trying to (re)-transmit data not being being in the socket
    buffe and therefore resulting in a panic.

    PR:                     264257
    PR:                     263445
    PR:                     260393
    Reviewed by:            rscheff@
    MFC after:              3 days
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36626

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 96 commit-hook freebsd_committer freebsd_triage 2022-09-22 10:17:55 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=5ae83e0d871bc7cbe4dcc9a33d37eb689e631efe

commit 5ae83e0d871bc7cbe4dcc9a33d37eb689e631efe
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2022-09-22 10:12:11 +0000
Commit:     Michael Tuexen <tuexen@FreeBSD.org>
CommitDate: 2022-09-22 10:12:11 +0000

    tcp: send ACKs when requested

    When doing Limited Transmit send an ACK when needed by the protocol
    processing (like sending ACKs with a DSACK block).

    PR:                     264257
    PR:                     263445
    PR:                     260393
    Reviewed by:            rscheff@
    MFC after:              3 days
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36631

 sys/netinet/tcp_input.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 97 commit-hook freebsd_committer freebsd_triage 2022-09-22 11:31:10 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=a743fc8826fa348b09d219632594c537f8e5690e

commit a743fc8826fa348b09d219632594c537f8e5690e
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2022-09-22 10:55:25 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-09-22 11:28:43 +0000

    tcp: fix cwnd restricted SACK retransmission loop

    While doing the initial SACK retransmission segment while heavily cwnd
    constrained, tcp_ouput can erroneously send out the entire sendbuffer
    again. This may happen after an retransmission timeout, which resets
    snd_nxt to snd_una while the SACK scoreboard is still populated.

    Reviewed By:            tuexen, #transport
    PR:                     264257
    PR:                     263445
    PR:                     260393
    MFC after:              3 days
    Sponsored by:           NetApp, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36637

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 98 commit-hook freebsd_committer freebsd_triage 2022-09-25 10:05:45 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=3651c4f42285644938e2f5bc924ab8c7ed857f83

commit 3651c4f42285644938e2f5bc924ab8c7ed857f83
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2022-09-22 10:55:25 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-09-25 08:52:56 +0000

    tcp: fix cwnd restricted SACK retransmission loop

    While doing the initial SACK retransmission segment while heavily cwnd
    constrained, tcp_ouput can erroneously send out the entire sendbuffer
    again. This may happen after an retransmission timeout, which resets
    snd_nxt to snd_una while the SACK scoreboard is still populated.

    Reviewed By:            tuexen, #transport
    PR:                     264257
    PR:                     263445
    PR:                     260393
    MFC after:              3 days
    Sponsored by:           NetApp, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36637

    (cherry picked from commit a743fc8826fa348b09d219632594c537f8e5690e)

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 99 commit-hook freebsd_committer freebsd_triage 2022-09-25 10:05:49 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9e69e009c86f259653610f3c337253b79381c7a7

commit 9e69e009c86f259653610f3c337253b79381c7a7
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2022-09-22 10:12:11 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-09-25 08:46:54 +0000

    tcp: send ACKs when requested

    When doing Limited Transmit send an ACK when needed by the protocol
    processing (like sending ACKs with a DSACK block).

    PR:                     264257
    PR:                     263445
    PR:                     260393
    Reviewed by:            rscheff@
    MFC after:              3 days
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36631

    (cherry picked from commit 5ae83e0d871bc7cbe4dcc9a33d37eb689e631efe)

 sys/netinet/tcp_input.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 100 commit-hook freebsd_committer freebsd_triage 2022-09-25 10:05:54 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=26370413d43bfd65500270ff331ae6bdf0f54133

commit 26370413d43bfd65500270ff331ae6bdf0f54133
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2022-09-19 10:42:43 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-09-25 08:41:54 +0000

    tcp: fix computation of offset

    Only update the offset if actually retransmitting from the
    scoreboard. If not done correctly, this may result in
    trying to (re)-transmit data not being being in the socket
    buffe and therefore resulting in a panic.

    PR:                     264257
    PR:                     263445
    PR:                     260393
    Reviewed by:            rscheff@
    MFC after:              3 days
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36626

    (cherry picked from commit 6d9e911fbadf3b409802a211c1dae9b47cb5a2b8)

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 101 commit-hook freebsd_committer freebsd_triage 2022-09-25 10:06:07 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=0612d3000b974f31de15c90c77bf43f121fc8656

commit 0612d3000b974f31de15c90c77bf43f121fc8656
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2022-09-19 10:42:43 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-09-25 08:54:18 +0000

    tcp: fix computation of offset

    Only update the offset if actually retransmitting from the
    scoreboard. If not done correctly, this may result in
    trying to (re)-transmit data not being being in the socket
    buffe and therefore resulting in a panic.

    PR:                     264257
    PR:                     263445
    PR:                     260393
    Reviewed by:            rscheff@
    MFC after:              3 days
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36626

    (cherry picked from commit 6d9e911fbadf3b409802a211c1dae9b47cb5a2b8)

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 102 commit-hook freebsd_committer freebsd_triage 2022-09-25 10:06:08 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=f9edad0054652e020b8214f61c0e454fd48101a6

commit f9edad0054652e020b8214f61c0e454fd48101a6
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2022-09-22 10:12:11 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-09-25 08:55:41 +0000

    tcp: send ACKs when requested

    When doing Limited Transmit send an ACK when needed by the protocol
    processing (like sending ACKs with a DSACK block).

    PR:                     264257
    PR:                     263445
    PR:                     260393
    Reviewed by:            rscheff@
    MFC after:              3 days
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36631

    (cherry picked from commit 5ae83e0d871bc7cbe4dcc9a33d37eb689e631efe)

 sys/netinet/tcp_input.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 103 commit-hook freebsd_committer freebsd_triage 2022-09-25 10:06:09 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=c1f9a81e7bfe354dfa4f191d5180426f76bc514b

commit c1f9a81e7bfe354dfa4f191d5180426f76bc514b
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2022-09-22 10:55:25 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-09-25 08:56:28 +0000

    tcp: fix cwnd restricted SACK retransmission loop

    While doing the initial SACK retransmission segment while heavily cwnd
    constrained, tcp_ouput can erroneously send out the entire sendbuffer
    again. This may happen after an retransmission timeout, which resets
    snd_nxt to snd_una while the SACK scoreboard is still populated.

    Reviewed By:            tuexen, #transport
    PR:                     264257
    PR:                     263445
    PR:                     260393
    MFC after:              3 days
    Sponsored by:           NetApp, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36637

    (cherry picked from commit a743fc8826fa348b09d219632594c537f8e5690e)

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 104 Michael Tuexen freebsd_committer freebsd_triage 2022-10-12 07:00:55 UTC
I think this issue is fixed. If the problem still exists, please re-open.