I'm running 2 servers: Ryzen 7 3800X with Intel(R) X550-T2 - stable/13-n248216-f1d2f22b34a Xeon E5-1650 v4 with Intel(R) I350 (Copper) - stable/13-n248512-155748c1e75 Both crash with a "Page Fault" messasge, here is output from kgdb: # kgdb /boot/kernel/kernel /var/crash/vmcore.0 GNU gdb (GDB) 11.1 [GDB v11.1 for FreeBSD] Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd13.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... Unread portion of the kernel message buffer: [193803] [193803] [193803] Fatal trap 12: page fault while in kernel mode [193803] cpuid = 0; apic id = 00 [193803] fault virtual address = 0x8 [193803] fault code = supervisor read data, page not present [193803] instruction pointer = 0x20:0xffffffff80caf078 [193803] stack pointer = 0x28:0xfffffe017e330850 [193803] frame pointer = 0x28:0xfffffe017e330890 [193803] code segment = base rx0, limit 0xfffff, type 0x1b [193803] = DPL 0, pres 1, long 1, def32 0, gran 1 [193803] processor eflags = interrupt enabled, resume, IOPL = 0 [193803] current process = 0 (if_io_tqg_0) [193803] trap number = 12 [193803] panic: page fault [193803] cpuid = 0 [193803] time = 1639284248 [193803] KDB: stack backtrace: [193803] #0 0xffffffff80c60485 at kdb_backtrace+0x65 [193803] #1 0xffffffff80c12cdf at vpanic+0x17f [193803] #2 0xffffffff80c12b53 at panic+0x43 [193803] #3 0xffffffff810982d5 at trap_fatal+0x385 [193803] #4 0xffffffff8109832f at trap_pfault+0x4f [193803] #5 0xffffffff8106fae8 at calltrap+0x8 [193803] #6 0xffffffff80caf287 at sbdrop+0x37 [193803] #7 0xffffffff80dcce83 at tcp_do_segment+0x2d93 [193803] #8 0xffffffff80dc93b1 at tcp_input_with_port+0xb61 [193803] #9 0xffffffff80dca05b at tcp_input+0xb [193803] #10 0xffffffff80dbb82f at ip_input+0x11f [193803] #11 0xffffffff80d48849 at netisr_dispatch_src+0xb9 [193803] #12 0xffffffff80d2c7d8 at ether_demux+0x138 [193803] #13 0xffffffff80d2db65 at ether_nh_input+0x355 [193803] #14 0xffffffff80d48849 at netisr_dispatch_src+0xb9 [193803] #15 0xffffffff80d2cc09 at ether_input+0x69 [193803] #16 0xffffffff80d44cb7 at iflib_rxeof+0xc27 [193803] #17 0xffffffff80d3f302 at _task_fn_rx+0x72 [193803] Uptime: 2d5h50m3s [193803] Dumping 5207 out of 130927 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) And the 2nd one: # kgdb /boot/kernel/kernel /var/crash/vmcore.3 GNU gdb (GDB) 11.1 [GDB v11.1 for FreeBSD] Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd13.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... Unread portion of the kernel message buffer: IOPL = 0 [149983] current process = 0 (if_io_tqg_6) [149983] trap number = 12 [149983] panic: page fault [149983] cpuid = 6 [149983] time = 1639293246 [149983] KDB: stack backtrace: [149983] #0 0xffffffff80c78ac5 at kdb_backtrace+0x65 [149983] #1 0xffffffff80c2a207 at vpanic+0x187 [149983] #2 0xffffffff80c2a073 at panic+0x43 [149983] #3 0xffffffff810b71c7 at trap_fatal+0x387 [149983] #4 0xffffffff810b721f at trap_pfault+0x4f [149983] #5 0xffffffff810b689a at trap+0x26a [149983] #6 0xffffffff8108e1b8 at calltrap+0x8 [149983] #7 0xffffffff80deee44 at tcp_output+0x11d4 [149983] #8 0xffffffff80de5fd0 at tcp_do_segment+0x2c00 [149983] #9 0xffffffff80de2702 at tcp_input_with_port+0xb82 [149983] #10 0xffffffff80de333b at tcp_input+0xb [149983] #11 0xffffffff80dd4bf1 at ip_input+0x121 [149983] #12 0xffffffff80d6276a at netisr_dispatch_src+0xca [149983] #13 0xffffffff80d467a8 at ether_demux+0x138 [149983] #14 0xffffffff80d47b4e at ether_nh_input+0x34e [149983] #15 0xffffffff80d6276a at netisr_dispatch_src+0xca [149983] #16 0xffffffff80d46bf9 at ether_input+0x69 [149983] #17 0xffffffff80d5eea3 at iflib_rxeof+0xc63 [149983] Uptime: 1d17h39m43s [149983] Dumping 3384 out of 65425 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb)
Created attachment 230083 [details] core.txt.1 Here's the core.txt.1 from the server that was most-recently updated/rebuilt kernel/world (which was yesterday night)
If any additional information is needed - let me know and I'll provide it. The problem keeps happening once every 2-3 days, constantly, on both servers.
Is it possible to access the generated core files?
(In reply to Michael Tuexen from comment #3) Depends. How do you want to access it? Can I access it for you and provide the necessary output? There is proprietary company software running on these servers, and if the crash dump contains parts/whole of the binaries - I would not be able to provide you with direct access to the dump. Let me know.
(In reply to Dobri Dobrev from comment #4) The kernel dump most likely contains stuff you don't want to share... So could you start kgdb with one of the cores and provide the output of `where`. If you do this for the first one, go up the stack until you are in `sbdrop` and provide also the output of `print *sb`.
(In reply to Michael Tuexen from comment #5) I updated to stable/13-n248590-b7da472979a, waiting to crash and will check the dump.
(In reply to Dobri Dobrev from comment #6) When you are at the debugger, you can type `dump` and `reboot` and the kernel dump should be written to disk. After reboot, you can then use `sudo kgdb -c /var/crash/vmcore.last /boot/kernel/kernel` to start the debugger and we can have a look at it. You can leave and start the debugger multiple times. Just don't update the kernel during that time...
(In reply to Michael Tuexen from comment #7) For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... Unread portion of the kernel message buffer: [314230] processor eflags = interrupt enabled, resume, IOPL = 0 [314230] current process = 0 (if_io_tqg_1) [314230] trap number = 12 [314230] panic: page fault [314230] cpuid = 1 [314230] time = 1639952536 [314230] KDB: stack backtrace: [314230] #0 0xffffffff80c60dd5 at kdb_backtrace+0x65 [314230] #1 0xffffffff80c1336f at vpanic+0x17f [314230] #2 0xffffffff80c131e3 at panic+0x43 [314230] #3 0xffffffff810991b5 at trap_fatal+0x385 [314230] #4 0xffffffff8109920f at trap_pfault+0x4f [314230] #5 0xffffffff810705e8 at calltrap+0x8 [314230] #6 0xffffffff80dd5fa9 at tcp_output+0x1339 [314230] #7 0xffffffff80dcd382 at tcp_do_segment+0x2902 [314230] #8 0xffffffff80dc9d41 at tcp_input_with_port+0xb61 [314230] #9 0xffffffff80dca9eb at tcp_input+0xb [314230] #10 0xffffffff80dbc1bf at ip_input+0x11f [314230] #11 0xffffffff80d491a9 at netisr_dispatch_src+0xb9 [314230] #12 0xffffffff80d2d128 at ether_demux+0x138 [314230] #13 0xffffffff80d2e4b5 at ether_nh_input+0x355 [314230] #14 0xffffffff80d491a9 at netisr_dispatch_src+0xb9 [314230] #15 0xffffffff80d2d559 at ether_input+0x69 [314230] #16 0xffffffff80d45617 at iflib_rxeof+0xc27 [314230] #17 0xffffffff80d3fc62 at _task_fn_rx+0x72 [314230] Uptime: 3d15h17m10s [314230] Dumping 5461 out of 130927 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) where #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0xffffffff80c12f6c in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487 #3 0xffffffff80c133de in vpanic (fmt=0xffffffff81191bdd "%s", ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:920 #4 0xffffffff80c131e3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:844 #5 0xffffffff810991b5 in trap_fatal (frame=0xfffffe00d3bfd5b0, eva=24) at /usr/src/sys/amd64/amd64/trap.c:944 #6 0xffffffff8109920f in trap_pfault (frame=0xfffffe00d3bfd5b0, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:763 #7 <signal handler called> #8 m_copydata (m=0x0, m@entry=0xfffff8010ee80d00, off=0, len=1, cp=<optimized out>) at /usr/src/sys/kern/uipc_mbuf.c:657 #9 0xffffffff80dd5fa9 in tcp_output (tp=<optimized out>) at /usr/src/sys/netinet/tcp_output.c:1081 #10 0xffffffff80dcd382 in tcp_do_segment (m=<optimized out>, th=<optimized out>, so=<optimized out>, tp=0xfffffe0251638870, drop_hdrlen=40, tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2822 #11 0xffffffff80dc9d41 in tcp_input_with_port (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>, port=port@entry=0) at /usr/src/sys/netinet/tcp_input.c:1400 #12 0xffffffff80dca9eb in tcp_input (mp=0xfffff8010ee80d00, offp=0x0, proto=1) at /usr/src/sys/netinet/tcp_input.c:1496 #13 0xffffffff80dbc1bf in ip_input (m=0x0) at /usr/src/sys/netinet/ip_input.c:834 #14 0xffffffff80d491a9 in netisr_dispatch_src (proto=1, source=source@entry=0, m=0xfffff8015b58d700) at /usr/src/sys/net/netisr.c:1143 #15 0xffffffff80d4957f in netisr_dispatch (proto=250088704, m=0x1) at /usr/src/sys/net/netisr.c:1234 #16 0xffffffff80d2d128 in ether_demux (ifp=ifp@entry=0xfffff80105343000, m=0x0) at /usr/src/sys/net/if_ethersubr.c:921 #17 0xffffffff80d2e4b5 in ether_input_internal (ifp=0xfffff80105343000, m=0x0) at /usr/src/sys/net/if_ethersubr.c:707 #18 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:737 #19 0xffffffff80d491a9 in netisr_dispatch_src (proto=proto@entry=5, source=source@entry=0, m=m@entry=0xfffff8015b58d700) at /usr/src/sys/net/netisr.c:1143 #20 0xffffffff80d4957f in netisr_dispatch (proto=250088704, proto@entry=5, m=0x1, m@entry=0xfffff8015b58d700) at /usr/src/sys/net/netisr.c:1234 #21 0xffffffff80d2d559 in ether_input (ifp=<optimized out>, m=0xfffff8015b58d700) at /usr/src/sys/net/if_ethersubr.c:828 #22 0xffffffff80d45617 in iflib_rxeof (rxq=<optimized out>, rxq@entry=0xfffffe00d68b6340, budget=<optimized out>) at /usr/src/sys/net/iflib.c:3046 #23 0xffffffff80d3fc62 in _task_fn_rx (context=0xfffffe00d68b6340) at /usr/src/sys/net/iflib.c:3989 #24 0xffffffff80c5f80d in gtaskqueue_run_locked (queue=queue@entry=0xfffff80103920b00) at /usr/src/sys/kern/subr_gtaskqueue.c:371 #25 0xffffffff80c5f482 in gtaskqueue_thread_loop (arg=<optimized out>, arg@entry=0xfffffe00d6bd1020) at /usr/src/sys/kern/subr_gtaskqueue.c:547 #26 0xffffffff80bd053e in fork_exit (callout=0xffffffff80c5f3c0 <gtaskqueue_thread_loop>, arg=0xfffffe00d6bd1020, frame=0xfffffe00d3bfdf40) at /usr/src/sys/kern/kern_fork.c:1092 #27 <signal handler called> #28 mi_startup () at /usr/src/sys/kern/init_main.c:322 Backtrace stopped: Cannot access memory at address 0xb (kgdb) print *tcp_output $1 = {int (struct tcpcb *)} 0xffffffff80dd4c70 <tcp_output> (kgdb) print *m_copydata $2 = {void (const struct mbuf *, int, int, caddr_t)} 0xffffffff80ca5bd0 <m_copydata> (kgdb) print *tcp_do_segment $3 = {void (struct mbuf *, struct tcphdr *, struct socket *, struct tcpcb *, int, int, uint8_t)} 0xffffffff80dcaa80 <tcp_do_segment> (kgdb)
Please run frame 8 list print *(struct mbuf *)0xfffff8010ee80d00 frame 10 print *tp frame 12 print **mp
(In reply to Michael Tuexen from comment #9) (kgdb) frame 8 #8 m_copydata (m=0x0, m@entry=0xfffff8010ee80d00, off=0, len=1, cp=<optimized out>) at /usr/src/sys/kern/uipc_mbuf.c:657 657 count = min(m->m_len - off, len); (kgdb) list 652 off -= m->m_len; 653 m = m->m_next; 654 } 655 while (len > 0) { 656 KASSERT(m != NULL, ("m_copydata, length > size of mbuf chain")); 657 count = min(m->m_len - off, len); 658 if ((m->m_flags & M_EXTPG) != 0) 659 m_copyfromunmapped(m, off, count, cp); 660 else 661 bcopy(mtod(m, caddr_t) + off, cp, count); (kgdb) print *(struct mbuf *)0xfffff8010ee80d00 $1 = {{m_next = 0x0, m_slist = {sle_next = 0x0}, m_stailq = {stqe_next = 0x0}}, {m_nextpkt = 0x0, m_slistpkt = {sle_next = 0x0}, m_stailqpkt = {stqe_next = 0x0}}, m_data = 0xfffff8015b91e528 "&i\365\267\254\350s\352,\025\216*\265\216\004\024\201j\256\245?\225<\020)W\214%\212\371\221$\205s\277LE<\326\340\032\267\377\366\214\217\235\215^)1x\377\342\032\234Ƃ\217]\211\375\333h\361\212\320nE\024\370\330\325S8\272\001y\023\304;\016:\017\032kT5\323\300\f\245MJd\n\025W\352c\321\062)Pl{/\263\320>6\231\362x\305\311\031ö\vy\356&É\265\343;_\273`\272\005\205\315m(\353쁞\001\223\254\371\037]UN\357\202%\201\364\033\r\232G$-N\251\262#\264\204\375\t\321\036\203\241\254\274\314ز\252jŹc.k\217\224#\235\206\241U\262\a\215I\035&\253j3"..., m_len = 24, m_type = 1, m_flags = 1, {{{m_pkthdr = {{snd_tag = 0x0, rcvif = 0x0}, tags = {slh_first = 0x0}, len = 1337, flowid = 0, csum_flags = 0, fibnum = 0, numa_domain = 255 '\377', rsstype = 0 '\000', {rcv_tstmp = 0, {l2hlen = 0 '\000', l3hlen = 0 '\000', l4hlen = 0 '\000', l5hlen = 0 '\000', inner_l2hlen = 0 '\000', inner_l3hlen = 0 '\000', inner_l4hlen = 0 '\000', inner_l5hlen = 0 '\000'}}, PH_per = {eight = "\000\000\000\000\377\377\000", sixteen = { 0, 0, 65535, 0}, thirtytwo = {0, 65535}, sixtyfour = {281470681743360}, unintptr = {281470681743360}, ptr = 0xffff00000000}, PH_loc = {eight = "\000\000\000\000\000\000\000", sixteen = {0, 0, 0, 0}, thirtytwo = {0, 0}, sixtyfour = {0}, unintptr = {0}, ptr = 0x0}}, {m_epg_npgs = 0 '\000', m_epg_nrdy = 0 '\000', m_epg_hdrlen = 0 '\000', m_epg_trllen = 0 '\000', m_epg_1st_off = 0, m_epg_last_len = 0, m_epg_flags = 0 '\000', m_epg_record_type = 0 '\000', __spare = "\000", m_epg_enc_cnt = 0, m_epg_tls = 0x539, m_epg_so = 0xff000000000000, m_epg_seqno = 0, m_epg_stailq = {stqe_next = 0xffff00000000}}}, { m_ext = {{ext_count = 1, ext_cnt = 0x1}, ext_size = 2048, ext_type = 6, ext_flags = 1, {{ext_buf = 0xfffff8015b91e000 "\023\367\265R\030\254\212\342\220\255\331'\206\217\245f\223o\aH\205\277\222", ext_arg2 = 0x0}, {extpg_pa = {18446735283447783424, 0, 0, 0, 0}, extpg_trail = '\000' <repeats 63 times>, extpg_hdr = '\000' <repeats 22 times>}}, ext_free = 0x0, ext_arg1 = 0x0}, m_pktdat = 0xfffff8010ee80d58 "\001"}}, m_dat = 0xfffff8010ee80d20 ""}} (kgdb) frame 10 #10 0xffffffff80dcd382 in tcp_do_segment (m=<optimized out>, th=<optimized out>, so=<optimized out>, tp=0xfffffe0251638870, drop_hdrlen=40, tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2822 2822 tcp_sack_partialack(tp, th); (kgdb) print *tp $2 = {t_inpcb = 0xfffff80a54294000, t_fb = 0xffffffff8193b000 <tcp_def_funcblk>, t_fb_ptr = 0x0, t_maxseg = 1360, t_logstate = 0, t_port = 0, t_state = 8, t_idle_reduce = 0, t_delayed_ack = 0, t_fin_is_rst = 0, t_log_state_set = 0, bits_spare = 0, t_flags = 554697333, snd_una = 3223852179, snd_max = 3223852205, snd_nxt = 3223852204, snd_up = 3223850831, snd_wnd = 65292, snd_cwnd = 1359, t_peakrate_thr = 0, ts_offset = 0, rfbuf_ts = 313886170, rcv_numsacks = 0, t_tsomax = 65535, t_tsomaxsegcount = 37, t_tsomaxsegsize = 4096, rcv_nxt = 2467824635, rcv_adv = 2467891323, rcv_wnd = 66688, t_flags2 = 1024, t_srtt = 3309, t_rttvar = 287, ts_recent = 0, snd_scale = 2 '\002', rcv_scale = 6 '\006', snd_limited = 0 '\000', request_r_scale = 6 '\006', last_ack_sent = 2467824635, t_rcvtime = 2461112999, rcv_up = 2467824635, t_segqlen = 0, t_segqmbuflen = 0, t_segq = {tqh_first = 0x0, tqh_last = 0xfffffe0251638900}, t_in_pkt = 0x0, t_tail_pkt = 0x0, t_timers = 0xfffffe0251638b18, t_vnet = 0xfffff801014c0580, snd_ssthresh = 2720, snd_wl1 = 2467824635, snd_wl2 = 3223852179, irs = 2467822589, iss = 3223768989, t_acktime = 0, t_sndtime = 2460931776, ts_recent_age = 0, snd_recover = 3223852205, cl4_spare = 0, t_oobflags = 0 '\000', t_iobc = 0 '\000', t_rxtcur = 64000, t_rxtshift = 11, t_rtttime = 0, t_rtseq = 3223852203, t_starttime = 2460765463, t_fbyte_in = 2460765472, t_fbyte_out = 2460765472, t_pmtud_saved_maxseg = 0, t_blackhole_enter = 0, t_blackhole_exit = 0, t_rttmin = 30, t_rttbest = 3596, t_softerror = 0, max_sndwnd = 66640, snd_cwnd_prev = 8160, snd_ssthresh_prev = 2720, snd_recover_prev = 3223823643, t_sndzerowin = 0, t_rttupdated = 9, snd_numholes = 1, t_badrxtwin = 2460781714, snd_holes = {tqh_first = 0xfffff806d12b8780, tqh_last = 0xfffff806d12b8790}, snd_fack = 3223852203, sackblks = {{start = 2467824634, end = 2467824635}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}}, sackhint = {nexthole = 0xfffff806d12b8780, sack_bytes_rexmit = 0, last_sack_ack = 3223852203, delivered_data = 12, sacked_bytes = 0, recover_fs = 1373, prr_delivered = 2722, prr_out = 4105}, t_rttlow = 84, rfbuf_cnt = 0, tod = 0x0, t_sndrexmitpack = 59, t_rcvoopack = 0, t_toe = 0x0, cc_algo = 0xffffffff81937eb0 <newreno_cc_algo>, ccv = 0xfffffe0251638c60, osd = 0xfffffe0251638c88, t_bytes_acked = 0, t_maxunacktime = 0, t_keepinit = 0, t_keepidle = 0, t_keepintvl = 0, t_keepcnt = 0, t_dupacks = 0, t_lognum = 0, t_loglimit = 5000, t_pacing_rate = -1, t_logs = {stqh_first = 0x0, stqh_last = 0xfffffe0251638a88}, t_lin = 0x0, t_lib = 0x0, t_output_caller = 0x0, t_stats = 0x0, t_logsn = 0, gput_ts = 0, gput_seq = 0, gput_ack = 0, t_stats_gput_prev = 0, t_maxpeakrate = 0, t_sndtlppack = 0, t_sndtlpbyte = 0, t_sndbytes = 91397, t_snd_rxt_bytes = 61193, t_tfo_client_cookie_len = 0 '\000', t_end_info_status = 0, t_tfo_pending = 0x0, t_tfo_cookie = {client = '\000' <repeats 15 times>, server = 0}, {t_end_info_bytes = "\000\000\000\000\000\000\000", t_end_info = 0}} (kgdb) frame 12 #12 0xffffffff80dca9eb in tcp_input (mp=0xfffff8010ee80d00, offp=0x0, proto=1) at /usr/src/sys/netinet/tcp_input.c:1496 1496 return(tcp_input_with_port(mp, offp, proto, 0)); (kgdb) print **mp Cannot access memory at address 0x0 (kgdb)
frame 12 print **mp
(In reply to Michael Tuexen from comment #11) Sorry, I meant: frame 12 print *mp
(In reply to Michael Tuexen from comment #12) (kgdb) frame 12 #12 0xffffffff80dca9eb in tcp_input (mp=0xfffff8010ee80d00, offp=0x0, proto=1) at /usr/src/sys/netinet/tcp_input.c:1496 1496 return(tcp_input_with_port(mp, offp, proto, 0)); (kgdb) print *mp $3 = (struct mbuf *) 0x0 (kgdb)
frame 9 print mb print *mb print moff print len
(In reply to Michael Tuexen from comment #14) (kgdb) frame 9 #9 0xffffffff80dd5fa9 in tcp_output (tp=<optimized out>) at /usr/src/sys/netinet/tcp_output.c:1081 1081 m_copydata(mb, moff, len, (kgdb) print mb $1 = (struct mbuf *) 0xfffff8010ee80d00 (kgdb) print *mb $2 = {{m_next = 0x0, m_slist = {sle_next = 0x0}, m_stailq = {stqe_next = 0x0}}, {m_nextpkt = 0x0, m_slistpkt = {sle_next = 0x0}, m_stailqpkt = {stqe_next = 0x0}}, m_data = 0xfffff8015b91e528 "&i\365\267\254\350s\352,\025\216*\265\216\004\024\201j\256\245?\225<\020)W\214%\212\371\221$\205s\277LE<\326\340\032\267\377\366\214\217\235\215^)1x\377\342\032\234Ƃ\217]\211\375\333h\361\212\320nE\024\370\330\325S8\272\001y\023\304;\016:\017\032kT5\323\300\f\245MJd\n\025W\352c\321\062)Pl{/\263\320>6\231\362x\305\311\031ö\vy\356&É\265\343;_\273`\272\005\205\315m(\353쁞\001\223\254\371\037]UN\357\202%\201\364\033\r\232G$-N\251\262#\264\204\375\t\321\036\203\241\254\274\314ز\252jŹc.k\217\224#\235\206\241U\262\a\215I\035&\253j3"..., m_len = 24, m_type = 1, m_flags = 1, {{{m_pkthdr = {{snd_tag = 0x0, rcvif = 0x0}, tags = {slh_first = 0x0}, len = 1337, flowid = 0, csum_flags = 0, fibnum = 0, numa_domain = 255 '\377', rsstype = 0 '\000', {rcv_tstmp = 0, {l2hlen = 0 '\000', l3hlen = 0 '\000', l4hlen = 0 '\000', l5hlen = 0 '\000', inner_l2hlen = 0 '\000', inner_l3hlen = 0 '\000', inner_l4hlen = 0 '\000', inner_l5hlen = 0 '\000'}}, PH_per = {eight = "\000\000\000\000\377\377\000", sixteen = { 0, 0, 65535, 0}, thirtytwo = {0, 65535}, sixtyfour = {281470681743360}, unintptr = {281470681743360}, ptr = 0xffff00000000}, PH_loc = {eight = "\000\000\000\000\000\000\000", sixteen = {0, 0, 0, 0}, thirtytwo = {0, 0}, sixtyfour = {0}, unintptr = {0}, ptr = 0x0}}, {m_epg_npgs = 0 '\000', m_epg_nrdy = 0 '\000', m_epg_hdrlen = 0 '\000', m_epg_trllen = 0 '\000', m_epg_1st_off = 0, m_epg_last_len = 0, m_epg_flags = 0 '\000', m_epg_record_type = 0 '\000', __spare = "\000", m_epg_enc_cnt = 0, m_epg_tls = 0x539, m_epg_so = 0xff000000000000, m_epg_seqno = 0, m_epg_stailq = {stqe_next = 0xffff00000000}}}, { m_ext = {{ext_count = 1, ext_cnt = 0x1}, ext_size = 2048, ext_type = 6, ext_flags = 1, {{ext_buf = 0xfffff8015b91e000 "\023\367\265R\030\254\212\342\220\255\331'\206\217\245f\223o\aH\205\277\222", ext_arg2 = 0x0}, {extpg_pa = {18446735283447783424, 0, 0, 0, 0}, extpg_trail = '\000' <repeats 63 times>, extpg_hdr = '\000' <repeats 22 times>}}, ext_free = 0x0, ext_arg1 = 0x0}, m_pktdat = 0xfffff8010ee80d58 "\001"}}, m_dat = 0xfffff8010ee80d20 ""}} (kgdb) print moff $3 = 0 (kgdb) print len $4 = 1 (kgdb)
frame 8 print count print m print off print len
(In reply to Michael Tuexen from comment #17) (kgdb) frame 8 #8 m_copydata (m=0x0, m@entry=0xfffff8010ee80d00, off=0, len=1, cp=<optimized out>) at /usr/src/sys/kern/uipc_mbuf.c:657 657 count = min(m->m_len - off, len); (kgdb) print count $5 = <optimized out> (kgdb) print m $6 = (const struct mbuf *) 0x0 (kgdb) print off $7 = 0 (kgdb) print len $8 = 1 (kgdb)
(In reply to Michael Tuexen from comment #17) If something more is needed - I'll provide as quickly as possible.
(In reply to Dobri Dobrev from comment #19) Thanks. Right now I'm trying to figure out what could be going on. Are you using anything non-default? Alternate CC module? Alternate stack? Are you using long lived TCP connections or short lived? High bandwidth? Any hint you can provide?
(In reply to Michael Tuexen from comment #20) I'm using the exact same settings on 12.2 w/o problems. Most of the loader/sysctl are calomel defaults with maybe 2-3 settings changed in total. Traffic is 10~50 mbit RX, 120~220 mbit TX. PF enabled Nginx 1.21.4 running with: sendfile off tcp_nopush off keepalive_timeout 10s / 60s / 600s (different on different "servers") keepalive_requests 50 websocket connections on some servers The same configuration runs in 12.2 w/o any issues, current uptime ~67 days there. The problem only appears on 13/stable (Haven't tried 13.0-release due to some things implemented in stable that haven't made their way to 13.0-rel). Do you need anything else from the dump, or perhaps to test something?
I'm only having 1 patch to pf_table.c (3-4 lines in total, nothing major) that also works w/o any issues on 12.2, and from what I can see in the dump - pf is not related to the crash. I'll be willing to test 13.0-release, if needed, just need to apply that patch & rebuild the kernel.
(In reply to Dobri Dobrev from comment #22) I'm not interested in testing 13-release. But I would be interested if the problem also shows up if you don't use pf. Is that possible?
(In reply to Michael Tuexen from comment #23) Unfortunately PF is essential, I cannot disable it. To answer a question you might be having - yes, I tested without the PF patch before I made the bug submission.
(In reply to Dobri Dobrev from comment #24) The point I had in mind was to exclude pf from the system to be sure it is a TCP problem. But that does not seem to be possible. Thanks for the feedback. Will ask if I need more information...
(In reply to Michael Tuexen from comment #25) If you wish, I can test an earlier revision of 13/stable, before changes to these files (noticed you did some commits changing several of the files listed in the crash). Let me know which revision I should pull.
(In reply to Dobri Dobrev from comment #26) I actually don't know which version you should try. But you might pick some older version, give it a try and to some binary search. It would help to know which commit introduced the problem...
(In reply to Michael Tuexen from comment #27) I'll try to do that sometime tomorrow, and let you know.
Great, thanks. How long does it take for a machine to panic?
(In reply to Michael Tuexen from comment #29) Few days.
(In reply to Michael Tuexen from comment #29) I'm building d04c12765cfa2bf0f33f7489d48843648073ce06, will test it for few days.
(In reply to Dobri Dobrev from comment #31) Are you using a kernel build with INVARIANTS? If not, you might want to do that first. Maybe that gives a hint, because it might panic sooner...
(In reply to Michael Tuexen from comment #32) Let me know what "INVARIANTS" are and how to build the kernel with it. I'm building the "GENERIC" config.
Add to the kernel config file GENERIC options BUF_TRACKING # Track buffer history options DDB # Support DDB. options FULL_BUF_TRACKING # Track more buffer history options GDB # Support remote GDB. options DEADLKRES # Enable the deadlock resolver options INVARIANTS # Enable calls of extra sanity checking options INVARIANT_SUPPORT # Extra sanity checks of internal structures, required by INVARIANTS options QUEUE_MACRO_DEBUG_TRASH # Trash queue(2) internal pointers on invalidation options WITNESS # Enable checks to detect deadlocks and cycles options WITNESS_SKIPSPIN # Don't run witness on spinlocks for speed options MALLOC_DEBUG_MAXZONES=8 # Separate malloc(9) zones options VERBOSE_SYSINIT=0 # Support debug.verbose_sysinit, off by default and rebuild the kernel.
(In reply to Michael Tuexen from comment #34) I'll rebuild the kernel with these when I find which commit actually causes the problem (I suspect these options will slow down the system somewhat)
Hi, I wonder if we need to subtract 1 from tp->snd_max, when TF_SENTFIN is set? t_state = 8 #define TCPS_LAST_ACK 8 /* had fin and close; await FIN ACK */ t_flags = 554697333 = 0x21100275 #define TF_SENTFIN 0x00000010 /* have sent FIN */ I remember we did a similar fix some while back for SACK: /* * Exclude FIN sequence space in * the hole for the rescue retransmission, * and also don't create a hole, if only * the ACK for a FIN is outstanding. */ tcp_seq highdata = tp->snd_max; if (tp->t_flags & TF_SENTFIN) highdata--; Now in this piece of code leading up do the sbdrop() of 1 byte: if (tlen == 0) { if (SEQ_GT(th->th_ack, tp->snd_una) && SEQ_LEQ(th->th_ack, tp->snd_max) && !IN_RECOVERY(tp->t_flags) && (to.to_flags & TOF_SACK) == 0 && TAILQ_EMPTY(&tp->snd_holes)) { The SEQ_LEQ is compared against the wrong snd_max ? SEQ_LEQ(th->th_ack, tp->snd_max) --HPS
And similarly: acked = BYTES_THIS_ACK(tp, th); if (SENTFIN) acked--; ???
(In reply to Hans Petter Selasky from comment #37) The thing is... when did something in regards to this got changed? Had to have happened between 12.2 and 13 at some point.. if we can find the actual commit quickly, we can test..
(In reply to Michael Tuexen from comment #34) for some reason I'm unable to build world/kernel on older revisions due to "ld: error: /usr/obj/usr/src/amd64.amd64/tmp/lib/libc.so.7: undefined reference to __sys_pdfork [--no-allow-shlib-undefined]" No idea how to solve it.
Might be you need to: make toolchain first. --HPS
I note that in order to reach the sbdrop() where we panic happens we need to pass: if (tp->t_state == TCPS_ESTABLISHED) But: tp->t_state = 8 (TCPS_LAST_ACK) So that means there is a race somewhere. --HPS
Could you dump the INPCB aswell: print /x *tp->t_inpcb
(In reply to Hans Petter Selasky from comment #42) From which frame ?
(kgdb) frame 8
(In reply to Hans Petter Selasky from comment #44) (kgdb) where #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0xffffffff80c12f6c in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487 #3 0xffffffff80c133de in vpanic (fmt=0xffffffff81191bdd "%s", ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:920 #4 0xffffffff80c131e3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:844 #5 0xffffffff810991b5 in trap_fatal (frame=0xfffffe00d3bfd5b0, eva=24) at /usr/src/sys/amd64/amd64/trap.c:944 #6 0xffffffff8109920f in trap_pfault (frame=0xfffffe00d3bfd5b0, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:763 #7 <signal handler called> #8 m_copydata (m=0x0, m@entry=0xfffff8010ee80d00, off=0, len=1, cp=<optimized out>) at /usr/src/sys/kern/uipc_mbuf.c:657 #9 0xffffffff80dd5fa9 in tcp_output (tp=<optimized out>) at /usr/src/sys/netinet/tcp_output.c:1081 #10 0xffffffff80dcd382 in tcp_do_segment (m=<optimized out>, th=<optimized out>, so=<optimized out>, tp=0xfffffe0251638870, drop_hdrlen=40, tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2822 #11 0xffffffff80dc9d41 in tcp_input_with_port (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>, port=port@entry=0) at /usr/src/sys/netinet/tcp_input.c:1400 #12 0xffffffff80dca9eb in tcp_input (mp=0xfffff8010ee80d00, offp=0x0, proto=1) at /usr/src/sys/netinet/tcp_input.c:1496 #13 0xffffffff80dbc1bf in ip_input (m=0x0) at /usr/src/sys/netinet/ip_input.c:834 #14 0xffffffff80d491a9 in netisr_dispatch_src (proto=1, source=source@entry=0, m=0xfffff8015b58d700) at /usr/src/sys/net/netisr.c:1143 #15 0xffffffff80d4957f in netisr_dispatch (proto=250088704, m=0x1) at /usr/src/sys/net/netisr.c:1234 #16 0xffffffff80d2d128 in ether_demux (ifp=ifp@entry=0xfffff80105343000, m=0x0) at /usr/src/sys/net/if_ethersubr.c:921 #17 0xffffffff80d2e4b5 in ether_input_internal (ifp=0xfffff80105343000, m=0x0) at /usr/src/sys/net/if_ethersubr.c:707 #18 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:737 #19 0xffffffff80d491a9 in netisr_dispatch_src (proto=proto@entry=5, source=source@entry=0, m=m@entry=0xfffff8015b58d700) at /usr/src/sys/net/netisr.c:1143 #20 0xffffffff80d4957f in netisr_dispatch (proto=250088704, proto@entry=5, m=0x1, m@entry=0xfffff8015b58d700) at /usr/src/sys/net/netisr.c:1234 #21 0xffffffff80d2d559 in ether_input (ifp=<optimized out>, m=0xfffff8015b58d700) at /usr/src/sys/net/if_ethersubr.c:828 #22 0xffffffff80d45617 in iflib_rxeof (rxq=<optimized out>, rxq@entry=0xfffffe00d68b6340, budget=<optimized out>) at /usr/src/sys/net/iflib.c:3046 #23 0xffffffff80d3fc62 in _task_fn_rx (context=0xfffffe00d68b6340) at /usr/src/sys/net/iflib.c:3989 #24 0xffffffff80c5f80d in gtaskqueue_run_locked (queue=queue@entry=0xfffff80103920b00) at /usr/src/sys/kern/subr_gtaskqueue.c:371 #25 0xffffffff80c5f482 in gtaskqueue_thread_loop (arg=<optimized out>, arg@entry=0xfffffe00d6bd1020) at /usr/src/sys/kern/subr_gtaskqueue.c:547 #26 0xffffffff80bd053e in fork_exit (callout=0xffffffff80c5f3c0 <gtaskqueue_thread_loop>, arg=0xfffffe00d6bd1020, frame=0xfffffe00d3bfdf40) at /usr/src/sys/kern/kern_fork.c:1092 #27 <signal handler called> #28 mi_startup () at /usr/src/sys/kern/init_main.c:322 Backtrace stopped: Cannot access memory at address 0xb (kgdb) frame 8 #8 m_copydata (m=0x0, m@entry=0xfffff8010ee80d00, off=0, len=1, cp=<optimized out>) at /usr/src/sys/kern/uipc_mbuf.c:657 warning: Source file is more recent than executable. 657 count = min(m->m_len - off, len); (kgdb) print /x *tp->t_inpcb No symbol "tp" in current context. (kgdb) print /x tp->t_inpcb No symbol "tp" in current context. (kgdb) There doesn't appear to be "tp" in frame 8 ...
(In reply to Hans Petter Selasky from comment #44) Here is from frame 9: (kgdb) frame 9 #9 0xffffffff80dd5fa9 in tcp_output (tp=<optimized out>) at /usr/src/sys/netinet/tcp_output.c:1081 warning: Source file is more recent than executable. 1081 m_copydata(mb, moff, len, (kgdb) print /x *tp->t_inpcb value has been optimized out (kgdb) print /x tp->t_inpcb value has been optimized out (kgdb)
Try instead: frame 10 print /x *tp->t_inpcb
(In reply to Hans Petter Selasky from comment #47) Here is from frame 10: (kgdb) frame 10 #10 0xffffffff80dcd382 in tcp_do_segment (m=<optimized out>, th=<optimized out>, so=<optimized out>, tp=0xfffffe0251638870, drop_hdrlen=40, tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2822 warning: Source file is more recent than executable. 2822 tcp_sack_partialack(tp, th); (kgdb) print /x *tp->t_inpcb $1 = {inp_hash = {cle_next = 0x0, cle_prev = 0xfffffe02092fde90}, inp_pcbgrouphash = {cle_next = 0x0, cle_prev = 0x0}, inp_lock = {lock_object = {lo_name = 0xffffffff8117b820, lo_flags = 0x56b0000, lo_data = 0x0, lo_witness = 0x0}, rw_lock = 0xfffffe00d6bd4560}, inp_hpts = {tqe_next = 0x0, tqe_prev = 0x0}, inp_hpts_request = 0x0, inp_in_hpts = 0x0, inp_in_input = 0x0, inp_hpts_cpu = 0x0, inp_irq_cpu = 0x0, inp_refcount = 0x2, inp_flags = 0x8802000, inp_flags2 = 0x0, inp_input_cpu = 0x0, inp_hpts_cpu_set = 0x0, inp_input_cpu_set = 0x0, inp_hpts_calls = 0x0, inp_input_calls = 0x0, inp_irq_cpu_set = 0x0, inp_spare_bits2 = 0x0, inp_numa_domain = 0xff, inp_ppcb = 0xfffffe0251638870, inp_socket = 0xfffff8010ef223b0, inp_hptsslot = 0x0, inp_hpts_drop_reas = 0x0, inp_input = {tqe_next = 0x0, tqe_prev = 0x0}, inp_pcbinfo = 0xfffffe00d6a89758, inp_pcbgroup = 0x0, inp_pcbgroup_wild = {cle_next = 0x0, cle_prev = 0x0}, inp_cred = 0xfffff80103fa9500, inp_flow = 0x0, inp_vflag = 0x1, inp_ip_ttl = 0x40, inp_ip_p = 0x0, inp_ip_minttl = 0x0, inp_flowid = 0x73b2783d, inp_snd_tag = 0x0, inp_flowtype = 0x82, inp_rss_listen_bucket = 0x0, inp_inc = { inc_flags = 0x0, inc_len = 0x0, inc_fibnum = 0x1, inc_ie = {ie_fport = 0x49c2, ie_lport = 0xf710, ie_dependfaddr = {id46_addr = {ia46_pad32 = {0x0, 0x0, 0x0}, ia46_addr4 = {s_addr = 0xd6a971c5}}, id6_addr = {__u6_addr = { __u6_addr8 = {0x0 <repeats 12 times>, 0xc5, 0x71, 0xa9, 0xd6}, __u6_addr16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x71c5, 0xd6a9}, __u6_addr32 = {0x0, 0x0, 0x0, 0xd6a971c5}}}}, ie_dependladdr = {id46_addr = {ia46_pad32 = {0x0, 0x0, 0x0}, ia46_addr4 = {s_addr = 0xd011ca95}}, id6_addr = {__u6_addr = {__u6_addr8 = {0x0 <repeats 12 times>, 0x95, 0xca, 0x11, 0xd0}, __u6_addr16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xca95, 0xd011}, __u6_addr32 = {0x0, 0x0, 0x0, 0xd011ca95}}}}, ie6_zoneid = 0x0}}, inp_label = 0x0, inp_sp = 0xfffff8084f4d5a20, {inp_ip_tos = 0x0, inp_options = 0x0, inp_moptions = 0x0}, {in6p_options = 0x0, in6p_outputopts = 0x0, in6p_moptions = 0x0, in6p_icmp6filt = 0x0, in6p_cksum = 0x0, in6p_hops = 0x0}, inp_portlist = {cle_next = 0xfffff80bfc660d90, cle_prev = 0xfffff8080f614d00}, inp_phd = 0xfffff80105455c40, inp_gencnt = 0xc6f8d0f, spare_ptr = 0x0, inp_rt_cookie = 0x63, { inp_route = {ro_nh = 0xfffff8010e7a5e00, ro_lle = 0xfffff8015b783000, ro_prepend = 0x0, ro_plen = 0x0, ro_flags = 0x180, ro_mtu = 0x0, spare = 0x0, ro_dst = {sa_len = 0x10, sa_family = 0x2, sa_data = {0x0, 0x0, 0xc5, 0x71, 0xa9, 0xd6, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}}, inp_route6 = {ro_nh = 0xfffff8010e7a5e00, ro_lle = 0xfffff8015b783000, ro_prepend = 0x0, ro_plen = 0x0, ro_flags = 0x180, ro_mtu = 0x0, spare = 0x0, ro_dst = {sin6_len = 0x10, sin6_family = 0x2, sin6_port = 0x0, sin6_flowinfo = 0xd6a971c5, sin6_addr = {__u6_addr = {__u6_addr8 = {0x0 <repeats 16 times>}, __u6_addr16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, __u6_addr32 = {0x0, 0x0, 0x0, 0x0}}}, sin6_scope_id = 0x0}}}, inp_list = {cle_next = 0xfffff8015ba7dd90, cle_prev = 0xfffff8080f614d70}, inp_epoch_ctx = {data = {0x0, 0x0}}} (kgdb)
Then see if you can get this working: frame 10 print /x *(struct thread *)tp->t_inpcb.inp_lock.rw_lock
Then see if you can get this working: frame 10 print /x *(struct thread *)(tp->t_inpcb->inp_lock.rw_lock)
(In reply to Hans Petter Selasky from comment #50) (kgdb) frame 10 #10 0xffffffff80dcd382 in tcp_do_segment (m=<optimized out>, th=<optimized out>, so=<optimized out>, tp=0xfffffe0251638870, drop_hdrlen=40, tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2822 2822 tcp_sack_partialack(tp, th); (kgdb) print /x *(struct thread *)tp->t_inpcb.inp_lock.rw_lock $3 = {td_lock = 0xfffffe00d68af0c0, td_proc = 0xffffffff81c8bea8, td_plist = {tqe_next = 0xfffffe00d6bd3e40, tqe_prev = 0xfffffe00d6bd4c90}, td_runq = {tqe_next = 0x0, tqe_prev = 0xfffffe00d68af190}, {td_slpq = {tqe_next = 0x0, tqe_prev = 0xfffff801014b7700}, td_zombie = 0x0}, td_lockq = {tqe_next = 0x0, tqe_prev = 0xfffffe020cdd8bf8}, td_hash = {le_next = 0x0, le_prev = 0xfffffe00d6b08550}, td_cpuset = 0xfffff8010396f180, td_domain = { dr_policy = 0xffffffff818010b8, dr_iter = 0x0}, td_sel = 0x0, td_sleepqueue = 0xfffff801014b7700, td_turnstile = 0xfffff8015b64a300, td_rlqe = 0x0, td_umtxq = 0xfffff8010392b000, td_tid = 0x186aa, td_sigqueue = {sq_signals = { __bits = {0x0, 0x0, 0x0, 0x0}}, sq_kill = {__bits = {0x0, 0x0, 0x0, 0x0}}, sq_ptrace = {__bits = {0x0, 0x0, 0x0, 0x0}}, sq_list = {tqh_first = 0x0, tqh_last = 0xfffffe00d6bd4638}, sq_proc = 0xffffffff81c8bea8, sq_flags = 0x1}, td_lend_user_pri = 0xff, td_allocdomain = 0x0, td_flags = 0x4010006, td_inhibitors = 0x0, td_pflags = 0x200000, td_pflags2 = 0x0, td_dupfd = 0x0, td_sqqueue = 0x0, td_wchan = 0x0, td_wmesg = 0x0, td_owepreempt = 0x0, td_tsqueue = 0x0, td_locks = 0x0, td_rw_rlocks = 0x0, td_sx_slocks = 0x0, td_lk_slocks = 0x0, td_stopsched = 0x1, td_blocked = 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0}, td_sleeplocks = 0x0, td_intr_nesting_level = 0x0, td_pinned = 0x3, td_realucred = 0xfffff801015fd800, td_ucred = 0xfffff801015fd800, td_limit = 0xfffff801015fd700, td_slptick = 0x0, td_blktick = 0x0, td_swvoltick = 0x92b19aa5, td_swinvoltick = 0x8a9cc00b, td_cow = 0x0, td_ru = {ru_utime = { tv_sec = 0x0, tv_usec = 0x0}, ru_stime = {tv_sec = 0x0, tv_usec = 0x0}, ru_maxrss = 0x0, ru_ixrss = 0x0, ru_idrss = 0x0, ru_isrss = 0x0, ru_minflt = 0x0, ru_majflt = 0x0, ru_nswap = 0x0, ru_inblock = 0x0, ru_oublock = 0x0, ru_msgsnd = 0x0, ru_msgrcv = 0x0, ru_nsignals = 0x0, ru_nvcsw = 0x1a6a5356, ru_nivcsw = 0x3}, td_rux = {rux_runtime = 0x63a4695bd17, rux_uticks = 0x0, rux_sticks = 0x3d50f, rux_iticks = 0x0, rux_uu = 0x0, rux_su = 0x715e57c6, rux_tu = 0x715e57c6}, td_incruntime = 0x807dd793, td_runtime = 0x63ac7110c3e, td_pticks = 0x3d55b, td_sticks = 0x4c, td_iticks = 0x0, td_uticks = 0x0, td_intrval = 0x0, td_oldsigmask = {__bits = {0x0, 0x0, 0x0, 0x0}}, td_generation = 0x1a6a5359, td_sigstk = {ss_sp = 0x0, ss_size = 0x0, ss_flags = 0x0}, td_xsig = 0x0, td_profil_addr = 0x0, td_profil_ticks = 0x0, td_name = {0x69, 0x66, 0x5f, 0x69, 0x6f, 0x5f, 0x74, 0x71, 0x67, 0x5f, 0x31, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, td_fpop = 0x0, td_dbgflags = 0x0, td_si = {si_signo = 0x0, si_errno = 0x0, si_code = 0x0, si_pid = 0x0, si_uid = 0x0, si_status = 0x0, si_addr = 0x0, si_value = {sival_int = 0x0, sival_ptr = 0x0, sigval_int = 0x0, sigval_ptr = 0x0}, _reason = {_fault = {_trapno = 0x0}, _timer = {_timerid = 0x0, _overrun = 0x0}, _mesgq = {_mqd = 0x0}, _poll = {_band = 0x0}, __spare__ = {__spare1__ = 0x0, __spare2__ = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}}}, td_ng_outbound = 0x0, td_osd = {osd_nslots = 0x0, osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}}, td_map_def_user = 0x0, td_dbg_forked = 0x0, td_vp_reserved = 0x0, td_no_sleeping = 0x1, td_su = 0x0, td_sleeptimo = 0x0, td_rtcgen = 0x0, td_errno = 0x0, td_vslock_sz = 0x0, td_kcov_info = 0x0, td_ucredref = 0x0, td_sigmask = {__bits = {0x0, 0x0, 0x0, 0x0}}, td_rqindex = 0x6, td_base_pri = 0x18, td_priority = 0x18, td_pri_class = 0x3, td_user_pri = 0x7f, td_base_user_pri = 0x7f, td_unused_0 = 0x0, td_rb_list = 0x0, td_rbp_list = 0x0, td_rb_inact = 0x0, td_sa = {code = 0x0, callp = 0x0, args = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}, td_sigblock_ptr = 0x0, td_sigblock_val = 0x0, td_pcb = 0xfffffe00d6bd4a70, td_state = 0x4, td_uretoff = {tdu_retval = {0x0, 0x0}, tdu_off = 0x0}, td_cowgen = 0x0, td_slpcallout = {c_links = {le = {le_next = 0x0, le_prev = 0x0}, sle = { sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0x0, c_precision = 0x0, c_arg = 0x0, c_func = 0x0, c_lock = 0x0, c_flags = 0x0, c_iflags = 0x10, c_cpu = 0x0}, td_frame = 0xfffffe00d3bfdf40, td_kstack = 0xfffffe00d3bfa000, td_kstack_pages = 0x4, td_critnest = 0x1, td_md = {md_spinlock_count = 0x1, md_saved_flags = 0x246, md_spurflt_addr = 0x0, md_invl_gen = {gen = 0x0, {link = {le_next = 0x1, le_prev = 0x0}, {next = 0x1, saved_pri = 0x0}}}, md_efirt_tmp = 0x0, md_efirt_dis_pf = 0x0, md_pcb = {pcb_r15 = 0xffffffff81cde1c8, pcb_r14 = 0xfffffe00d6b53c80, pcb_r13 = 0xfffffe00d6bd4560, pcb_r12 = 0xfffffe00d3bfddb8, pcb_rbp = 0xfffffe00d3bfde50, pcb_rsp = 0xfffffe00d3bfdda8, pcb_rbx = 0xfffffe00d68af0c0, pcb_rip = 0xffffffff80c45a59, pcb_fsbase = 0x0, pcb_gsbase = 0x0, pcb_kgsbase = 0x0, pcb_cr0 = 0x0, pcb_cr2 = 0x0, pcb_cr3 = 0x0, pcb_cr4 = 0x0, pcb_dr0 = 0x0, pcb_dr1 = 0x0, pcb_dr2 = 0x0, pcb_dr3 = 0x0, pcb_dr6 = 0x0, pcb_dr7 = 0x0, pcb_gdt = {rd_limit = 0x0, rd_base = 0x0}, pcb_idt = {rd_limit = 0x0, rd_base = 0x0}, pcb_ldt = {rd_limit = 0x0, rd_base = 0x0}, pcb_tr = 0x0, pcb_flags = 0x1, pcb_initial_fpucw = 0x0, pcb_onfault = 0x0, pcb_saved_ucr3 = 0x0, pcb_tssp = 0x0, pcb_efer = 0x0, pcb_star = 0x0, pcb_lstar = 0x0, pcb_cstar = 0x0, pcb_sfmask = 0x0, pcb_save = 0xfffffe00d6a6ed00, pcb_pad = {0x0, 0x0, 0x0, 0x0, 0x0}}, md_stack_base = 0xfffffe00d3bfe000, md_usr_fpu_save = 0xfffffe00d6a6ed00}, td_ar = 0x0, td_lprof = {{lh_first = 0x0}, {lh_first = 0x0}}, td_dtrace = 0xfffff80103920a00, td_vnet = 0xfffff801014c0580, td_vnet_lpush = 0x0, td_intr_frame = 0x0, td_rfppwait_p = 0x0, td_ma = 0x0, td_ma_cnt = 0x0, td_emuldata = 0x0, td_lastcpu = 0x1, td_oncpu = 0x1, td_lkpi_task = 0x0, td_pmcpend = 0x0, td_coredump = 0x0, td_ktr_io_lim = 0x0} (kgdb)
Hi, There appears to be multiple dumps with different issues! Decoding the thread name from the last printout you provided: td_name = {0x69, 0x66, 0x5f, 0x69, 0x6f, 0x5f, 0x74, 0x71, 0x67, 0x5f, 0x 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}; td_name = "if_io_tqg_1" And according to the panic backtrace: current process = 0 (if_io_tqg_1) So this is a different core-dump, probably different issue. Can you repeat in the GDB printouts from the thread with sbdrop() in the backtrace (if_io_tqg_1): I need to see: print /x *tp print /x *tp->t_inpcb Just try searching all frames for these variables. --HPS
(In reply to Hans Petter Selasky from comment #52) Hello, The sbdrop() only appeared once on 1 of the servers, I've since then updated the kernel there, and have not extracted core dumps / etc from that server at all (there's barely any traffic on it, so it hasn't crashed again) I've only provided further details from the 2nd server (where sbdrop() doesn't appear in "where") Also, the dump matches the currently installed kernel. I've updated the source tree and build a new world/kernel, but have not installed it. If you wish, I'll install the currently build kernel/world, wait for it to crash again and start extracting data from the dump? Let me know.
Let's use the updated machine for now and follow Michael's thread.
(In reply to Hans Petter Selasky from comment #54) So, Do I hold off installing the kernel/world that I've build today, or install it, wait for a crash and extract new data?
Let us try to use a kernel build with INVARIANTS, let it crash and look at the core. Don't update the system while we are doing this. And let us focus on one system and one crash at a time.
(In reply to Michael Tuexen from comment #56) I can build the kernel with invariants, however... 1. During runtime - is there a noticeable slowdown or anything that would otherwise interfere with production traffic / etc ? 2. When it crashes - is the downtime longer (crash dump generation, etc) than normal? Let me know.
(In reply to Dobri Dobrev from comment #57) 1. During runtime it is slower, since it does additional checking. 2. Downtime is the same.
Hi Dobri, Can you confirm that SACK is enabled? sysctl net.inet.tcp.sack.enable --HPS
(In reply to Hans Petter Selasky from comment #59) SACK is enabled. I'm rebuilding the latest available stable/13 kernel with invariants. When it crashes - I'll start providing data. Hopefully the slowdown doesn't interfere with production traffic, because if it does - I'll have to remove the invariants.
(In reply to Dobri Dobrev from comment #60) Thats OK. Lets see what happens.
(In reply to Michael Tuexen from comment #61) Running stable/13-n248688-ecb7f44be90, waiting for a crash. So far the invariants don't cause issues.
It could also be interesting to see the socket state, especially so_snd: Can you try this: frame 10 print /x *(tp->t_inpcb->inp_socket) --HPS
(In reply to Hans Petter Selasky from comment #63) I already installed the new kernel and cleaned all the old dumps (there wasn't much space left in /). I'll extract everything again after the next crash.
(In reply to Hans Petter Selasky from comment #63) So, here it is - I believe this is what we're looking for: "panic: tcp_m_copym, length > size of mbuf chain" Unread portion of the kernel message buffer: [12282] panic: tcp_m_copym, length > size of mbuf chain [12282] cpuid = 1 [12282] time = 1640209960 [12282] KDB: stack backtrace: [12282] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe017fd62550 [12282] vpanic() at vpanic+0x17f/frame 0xfffffe017fd625a0 [12282] panic() at panic+0x43/frame 0xfffffe017fd62600 [12282] tcp_m_copym() at tcp_m_copym+0x41b/frame 0xfffffe017fd626b0 [12282] tcp_output() at tcp_output+0x1433/frame 0xfffffe017fd62890 [12282] tcp_do_segment() at tcp_do_segment+0x2b9a/frame 0xfffffe017fd62960 [12282] tcp_input_with_port() at tcp_input_with_port+0xb7d/frame 0xfffffe017fd62aa0 [12282] tcp_input() at tcp_input+0xb/frame 0xfffffe017fd62ab0 [12282] ip_input() at ip_input+0x192/frame 0xfffffe017fd62b40 [12282] netisr_dispatch_src() at netisr_dispatch_src+0xaf/frame 0xfffffe017fd62ba0 [12282] ether_demux() at ether_demux+0x16e/frame 0xfffffe017fd62bd0 [12282] ether_nh_input() at ether_nh_input+0x3f8/frame 0xfffffe017fd62c30 [12282] netisr_dispatch_src() at netisr_dispatch_src+0xaf/frame 0xfffffe017fd62c90 [12282] ether_input() at ether_input+0x99/frame 0xfffffe017fd62cf0 [12282] iflib_rxeof() at iflib_rxeof+0xe07/frame 0xfffffe017fd62e00 [12282] _task_fn_rx() at _task_fn_rx+0x7a/frame 0xfffffe017fd62e40 [12282] gtaskqueue_run_locked() at gtaskqueue_run_locked+0xa7/frame 0xfffffe017fd62ec0 [12282] gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc2/frame 0xfffffe017fd62ef0 [12282] fork_exit() at fork_exit+0x80/frame 0xfffffe017fd62f30 [12282] fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe017fd62f30 [12282] --- trap 0, rip = 0x266300000000000, rsp = 0, rbp = 0 --- [12282] KDB: enter: panic __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) where #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=textdump@entry=0) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0xffffffff804c30fa in db_dump (dummy=<optimized out>, dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>) at /usr/src/sys/ddb/db_command.c:575 #3 0xffffffff804c2fb2 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=dopager@entry=1) at /usr/src/sys/ddb/db_command.c:482 #4 0xffffffff804c2c0d in db_command_loop () at /usr/src/sys/ddb/db_command.c:535 #5 0xffffffff804c60b6 in db_trap (type=<optimized out>, code=<optimized out>) at /usr/src/sys/ddb/db_main.c:270 #6 0xffffffff80c7a676 in kdb_trap (type=type@entry=3, code=code@entry=0, tf=tf@entry=0xfffffe017fd62480) at /usr/src/sys/kern/subr_kdb.c:733 #7 0xffffffff810ebd19 in trap (frame=0xfffffe017fd62480) at /usr/src/sys/amd64/amd64/trap.c:607 #8 <signal handler called> #9 kdb_enter (why=0xffffffff812e57c1 "panic", msg=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:506 #10 0xffffffff80c2c900 in vpanic (fmt=0xffffffff811c2a3b "tcp_m_copym, length > size of mbuf chain", ap=ap@entry=0xfffffe017fd625e0) at /usr/src/sys/kern/kern_shutdown.c:908 #11 0xffffffff80c2c693 in panic (fmt=0xffffffff81e9d040 <cnputs_mtx> "\302&*\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:844 #12 0xffffffff80e11a3b in tcp_m_copym (m=0x0, m@entry=0xfffff80bc680b500, off0=1388, plen=<optimized out>, plen@entry=0xfffffe017fd6282c, seglimit=1, seglimit@entry=0, segsize=segsize@entry=0, sb=<optimized out>, hw_tls=<optimized out>) at /usr/src/sys/netinet/tcp_output.c:2011 #13 0xffffffff80e0f893 in tcp_output (tp=<optimized out>) at /usr/src/sys/netinet/tcp_output.c:1091 #14 0xffffffff80e0607a in tcp_do_segment (m=<optimized out>, th=0xfffff80bc659e87a, so=<optimized out>, tp=0xfffffe0252e24000, drop_hdrlen=40, tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2822 #15 0xffffffff80e025bd in tcp_input_with_port (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>, port=port@entry=0) at /usr/src/sys/netinet/tcp_input.c:1400 #16 0xffffffff80e0340b in tcp_input (mp=0xffffffff81e9d040 <cnputs_mtx>, offp=0x80, proto=-2127893703) at /usr/src/sys/netinet/tcp_input.c:1496 #17 0xffffffff80df3d22 in ip_input (m=0x0) at /usr/src/sys/netinet/ip_input.c:834 #18 0xffffffff80d76f4f in netisr_dispatch_src (proto=1, source=source@entry=0, m=0xfffff80bc659e800) at /usr/src/sys/net/netisr.c:1143 #19 0xffffffff80d7729f in netisr_dispatch (proto=2179584064, m=0xffffffff812aeb39) at /usr/src/sys/net/netisr.c:1234 #20 0xffffffff80d5961e in ether_demux (ifp=ifp@entry=0xfffff8010731e800, m=0x80) at /usr/src/sys/net/if_ethersubr.c:921 #21 0xffffffff80d5ac98 in ether_input_internal (ifp=0xfffff8010731e800, m=0x80) at /usr/src/sys/net/if_ethersubr.c:707 #22 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:737 #23 0xffffffff80d76f4f in netisr_dispatch_src (proto=proto@entry=5, source=source@entry=0, m=m@entry=0xfffff80bc659e800) at /usr/src/sys/net/netisr.c:1143 #24 0xffffffff80d7729f in netisr_dispatch (proto=2179584064, proto@entry=5, m=0xffffffff812aeb39, m@entry=0xfffff80bc659e800) at /usr/src/sys/net/netisr.c:1234 #25 0xffffffff80d59ae9 in ether_input (ifp=0xfffff8010731e800, m=0xfffff80bc659e800) at /usr/src/sys/net/if_ethersubr.c:828 #26 0xffffffff80d72cc7 in iflib_rxeof (rxq=<optimized out>, rxq@entry=0xfffffe017ff65340, budget=<optimized out>) at /usr/src/sys/net/iflib.c:3046 #27 0xffffffff80d6ca6a in _task_fn_rx (context=0xfffffe017ff65340) at /usr/src/sys/net/iflib.c:3989 #28 0xffffffff80c78927 in gtaskqueue_run_locked (queue=queue@entry=0xfffff80105860600) at /usr/src/sys/kern/subr_gtaskqueue.c:371 #29 0xffffffff80c78752 in gtaskqueue_thread_loop (arg=arg@entry=0xfffffe017fed5020) at /usr/src/sys/kern/subr_gtaskqueue.c:547 #30 0xffffffff80be4ce0 in fork_exit (callout=0xffffffff80c78690 <gtaskqueue_thread_loop>, arg=0xfffffe017fed5020, frame=0xfffffe017fd62f40) at /usr/src/sys/kern/kern_fork.c:1092 #31 <signal handler called> #32 0x0266300000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0 (kgdb) Let me know what you need from the dump.
That was fast... Let's start with: frame 12 print *(struct mbuf *)0xfffff80bc680b500 print *(int32_t *)0xfffffe017fd6282c frame 14 print *th print *tp
(In reply to Michael Tuexen from comment #66) (kgdb) frame 12 #12 0xffffffff80e11a3b in tcp_m_copym (m=0x0, m@entry=0xfffff80bc680b500, off0=1388, plen=<optimized out>, plen@entry=0xfffffe017fd6282c, seglimit=1, seglimit@entry=0, segsize=segsize@entry=0, sb=<optimized out>, hw_tls=<optimized out>) at /usr/src/sys/netinet/tcp_output.c:2011 2011 KASSERT(len == M_COPYALL, (kgdb) print *(struct mbuf *)0xfffff80bc680b500 $1 = {{m_next = 0x0, m_slist = {sle_next = 0x0}, m_stailq = {stqe_next = 0x0}}, {m_nextpkt = 0x0, m_slistpkt = {sle_next = 0x0}, m_stailqpkt = {stqe_next = 0x0}}, m_data = 0xfffff8017874f000 "O\320mg\276\022\364u\353\271\061\270tI\356\063\227/\030\204\032d\\\252\274\261`PҲ\271\232F\343-\304\372\307<\031u\212\260\061ߐ\264\306i\361Vj\212\314ϓM\031R\257G\b\246\233\227\233,D\335C\220\273\022\025\223\251\361\211\222e+0M)\201\233\034e'\222\203\242h\201\017w\026\065\365\242خ\f\225\350\313\311\364$\244\262\265\370\375\237\f\206\303\r\"6\266F6\377\352\270\036?\022\fJ\032'\225\203Q\332Fy*d\225\373", <incomplete sequence \303>, m_len = 1999, m_type = 1, m_flags = 1, {{{m_pkthdr = {{snd_tag = 0x0, rcvif = 0x0}, tags = {slh_first = 0x0}, len = 1297, flowid = 0, csum_flags = 0, fibnum = 0, numa_domain = 255 '\377', rsstype = 0 '\000', {rcv_tstmp = 0, {l2hlen = 0 '\000', l3hlen = 0 '\000', l4hlen = 0 '\000', l5hlen = 0 '\000', inner_l2hlen = 0 '\000', inner_l3hlen = 0 '\000', inner_l4hlen = 0 '\000', inner_l5hlen = 0 '\000'}}, PH_per = {eight = "\000\000\000\000\377\377\000", sixteen = {0, 0, 65535, 0}, thirtytwo = {0, 65535}, sixtyfour = {281470681743360}, unintptr = {281470681743360}, ptr = 0xffff00000000}, PH_loc = {eight = "\000\000\000\000\000\000\000", sixteen = {0, 0, 0, 0}, thirtytwo = {0, 0}, sixtyfour = {0}, unintptr = {0}, ptr = 0x0}}, {m_epg_npgs = 0 '\000', m_epg_nrdy = 0 '\000', m_epg_hdrlen = 0 '\000', m_epg_trllen = 0 '\000', m_epg_1st_off = 0, m_epg_last_len = 0, m_epg_flags = 0 '\000', m_epg_record_type = 0 '\000', __spare = "\000", m_epg_enc_cnt = 0, m_epg_tls = 0x511, m_epg_so = 0xff000000000000, m_epg_seqno = 0, m_epg_stailq = {stqe_next = 0xffff00000000}}}, {m_ext = {{ext_count = 2, ext_cnt = 0xdeadc0de00000002}, ext_size = 2048, ext_type = 6, ext_flags = 1, {{ ext_buf = 0xfffff8017874f000 "O\320mg\276\022\364u\353\271\061\270tI\356\063\227/\030\204\032d\\\252\274\261`PҲ\271\232F\343-\304\372\307<\031u\212\260\061ߐ\264\306i\361Vj\212\314ϓM\031R\257G\b\246\233\227\233,D\335C\220\273\022\025\223\251\361\211\222e+0M)\201\233\034e'\222\203\242h\201\017w\026\065\365\242خ\f\225\350\313\311\364$\244\262\265\370\375\237\f\206\303\r\"6\266F6\377\352\270\036?\022\fJ\032'\225\203Q\332Fy*d\225\373", <incomplete sequence \303>, ext_arg2 = 0x0}, {extpg_pa = {18446735283932426240, 0, 16045693110842147038, 16045693110842147038, 16045693110842147038}, extpg_trail = "\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255", <incomplete sequence \336>, extpg_hdr = "\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255"}}, ext_free = 0x0, ext_arg1 = 0x0}, m_pktdat = 0xfffff80bc680b558 "\002"}}, m_dat = 0xfffff80bc680b520 ""}} (kgdb) print *(int32_t *)0xfffffe017fd6282c $2 = 612 (kgdb) frame 14 #14 0xffffffff80e0607a in tcp_do_segment (m=<optimized out>, th=0xfffff80bc659e87a, so=<optimized out>, tp=0xfffffe0252e24000, drop_hdrlen=40, tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2822 2822 tcp_sack_partialack(tp, th); (kgdb) print *th $3 = {th_sport = 43204, th_dport = 63248, th_seq = 2812027976, th_ack = 324807354, th_x2 = 0 '\000', th_off = 5 '\005', th_flags = 16 '\020', th_win = 16103, th_sum = 0, th_urp = 0} (kgdb) print *tp $4 = {t_inpcb = 0xfffff8090099b1f0, t_fb = 0xffffffff81b414a0 <tcp_def_funcblk>, t_fb_ptr = 0x0, t_maxseg = 1400, t_logstate = 0, t_port = 0, t_state = 6, t_idle_reduce = 0, t_delayed_ack = 0, t_fin_is_rst = 0, t_log_state_set = 0, bits_spare = 0, t_flags = 554697333, snd_una = 324805966, snd_max = 324807967, snd_nxt = 324807967, snd_up = 324805966, snd_wnd = 65800, snd_cwnd = 1400, t_peakrate_thr = 0, ts_offset = 0, rfbuf_ts = 12071754, rcv_numsacks = 0, t_tsomax = 65535, t_tsomaxsegcount = 37, t_tsomaxsegsize = 4096, rcv_nxt = 2812027976, rcv_adv = 2812093832, rcv_wnd = 65856, t_flags2 = 1024, t_srtt = 7549, t_rttvar = 947, ts_recent = 0, snd_scale = 2 '\002', rcv_scale = 6 '\006', snd_limited = 0 '\000', request_r_scale = 6 '\006', last_ack_sent = 2812027976, t_rcvtime = 2159165013, rcv_up = 2812027976, t_segqlen = 0, t_segqmbuflen = 0, t_segq = {tqh_first = 0x0, tqh_last = 0xfffffe0252e24090}, t_in_pkt = 0x0, t_tail_pkt = 0x0, t_timers = 0xfffffe0252e242a8, t_vnet = 0xfffff8010582fec0, snd_ssthresh = 2800, snd_wl1 = 2812027976, snd_wl2 = 324805966, irs = 2812024397, iss = 324701574, t_acktime = 0, t_sndtime = 2159073224, ts_recent_age = 0, snd_recover = 324807967, cl4_spare = 0, t_oobflags = 0 '\000', t_iobc = 0 '\000', t_rxtcur = 64000, t_rxtshift = 8, t_rtttime = 0, t_rtseq = 324807965, t_starttime = 2158904990, t_fbyte_in = 2158905017, t_fbyte_out = 2158905018, t_pmtud_saved_maxseg = 0, t_blackhole_enter = 0, t_blackhole_exit = 0, t_rttmin = 30, t_rttbest = 7842, t_softerror = 0, max_sndwnd = 65800, snd_cwnd_prev = 5600, snd_ssthresh_prev = 2800, snd_recover_prev = 324776566, t_sndzerowin = 0, t_rttupdated = 15, snd_numholes = 1, t_badrxtwin = 2158964144, snd_holes = { tqh_first = 0xfffff806d01890a0, tqh_last = 0xfffff806d01890b0}, snd_fack = 324807354, sackblks = {{start = 2812027975, end = 2812027976}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}}, sackhint = {nexthole = 0xfffff806d01890a0, sack_bytes_rexmit = 0, last_sack_ack = 324807354, delivered_data = 1388, sacked_bytes = 611, recover_fs = 3400, prr_delivered = 6800, prr_out = 7588}, t_rttlow = 190, rfbuf_cnt = 0, tod = 0x0, t_sndrexmitpack = 47, t_rcvoopack = 0, t_toe = 0x0, cc_algo = 0xffffffff81b3e350 <newreno_cc_algo>, ccv = 0xfffffe0252e243f0, osd = 0xfffffe0252e24418, t_bytes_acked = 0, t_maxunacktime = 0, t_keepinit = 0, t_keepidle = 0, t_keepintvl = 0, t_keepcnt = 0, t_dupacks = 0, t_lognum = 0, t_loglimit = 5000, t_pacing_rate = -1, t_logs = {stqh_first = 0x0, stqh_last = 0xfffffe0252e24218}, t_lin = 0x0, t_lib = 0x0, t_output_caller = 0x0, t_stats = 0x0, t_logsn = 0, gput_ts = 0, gput_seq = 0, gput_ack = 0, t_stats_gput_prev = 0, t_maxpeakrate = 0, t_sndtlppack = 0, t_sndtlpbyte = 0, t_sndbytes = 125990, t_snd_rxt_bytes = 40040, t_tfo_client_cookie_len = 0 '\000', t_end_info_status = 0, t_tfo_pending = 0x0, t_tfo_cookie = {client = '\000' <repeats 15 times>, server = 0}, { t_end_info_bytes = "\000\000\000\000\000\000\000", t_end_info = 0}} (kgdb)
Thanks. Need to think when I'm more awake than now...
(In reply to Michael Tuexen from comment #68) Same. Let's continue tomorrow.
Could you also get: frame 14 print /x *(tp->t_inpcb) print /x *(tp->t_inpcb->inp_socket) --HPS
(In reply to Hans Petter Selasky from comment #70) (kgdb) frame 14 #14 0xffffffff80e0607a in tcp_do_segment (m=<optimized out>, th=0xfffff80bc659e87a, so=<optimized out>, tp=0xfffffe0252e24000, drop_hdrlen=40, tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2822 2822 tcp_sack_partialack(tp, th); (kgdb) print /x *(tp->t_inpcb) $5 = {inp_hash = {cle_next = 0x0, cle_prev = 0xfffffe020ae2fe18}, inp_pcbgrouphash = {cle_next = 0x0, cle_prev = 0x0}, inp_lock = {lock_object = {lo_name = 0xffffffff811d9a83, lo_flags = 0x56b0000, lo_data = 0x0, lo_witness = 0xfffff8207fd75100}, rw_lock = 0xfffffe017fed7720}, inp_hpts = {tqe_next = 0x0, tqe_prev = 0x0}, inp_hpts_request = 0x0, inp_in_hpts = 0x0, inp_in_input = 0x0, inp_hpts_cpu = 0x0, inp_irq_cpu = 0x0, inp_refcount = 0x2, inp_flags = 0x8802000, inp_flags2 = 0x0, inp_input_cpu = 0x0, inp_hpts_cpu_set = 0x0, inp_input_cpu_set = 0x0, inp_hpts_calls = 0x0, inp_input_calls = 0x0, inp_irq_cpu_set = 0x0, inp_spare_bits2 = 0x0, inp_numa_domain = 0xff, inp_ppcb = 0xfffffe0252e24000, inp_socket = 0xfffff80900858000, inp_hptsslot = 0x0, inp_hpts_drop_reas = 0x0, inp_input = {tqe_next = 0x0, tqe_prev = 0x0}, inp_pcbinfo = 0xfffffe00d856f758, inp_pcbgroup = 0x0, inp_pcbgroup_wild = {cle_next = 0x0, cle_prev = 0x0}, inp_cred = 0xfffff80107538500, inp_flow = 0x0, inp_vflag = 0x1, inp_ip_ttl = 0x40, inp_ip_p = 0x0, inp_ip_minttl = 0x0, inp_flowid = 0x5e457bf3, inp_snd_tag = 0x0, inp_flowtype = 0x82, inp_rss_listen_bucket = 0x0, inp_inc = {inc_flags = 0x0, inc_len = 0x0, inc_fibnum = 0x1, inc_ie = {ie_fport = 0xa8c4, ie_lport = 0xf710, ie_dependfaddr = {id46_addr = { ia46_pad32 = {0x0, 0x0, 0x0}, ia46_addr4 = {s_addr = 0x2f2912b5}}, id6_addr = {__u6_addr = {__u6_addr8 = {0x0 <repeats 12 times>, 0xb5, 0x12, 0x29, 0x2f}, __u6_addr16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x12b5, 0x2f29}, __u6_addr32 = {0x0, 0x0, 0x0, 0x2f2912b5}}}}, ie_dependladdr = {id46_addr = {ia46_pad32 = {0x0, 0x0, 0x0}, ia46_addr4 = {s_addr = 0xd011ca95}}, id6_addr = {__u6_addr = {__u6_addr8 = { 0x0 <repeats 12 times>, 0x95, 0xca, 0x11, 0xd0}, __u6_addr16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xca95, 0xd011}, __u6_addr32 = {0x0, 0x0, 0x0, 0xd011ca95}}}}, ie6_zoneid = 0x0}}, inp_label = 0x0, inp_sp = 0xfffff80c371a4160, {inp_ip_tos = 0x0, inp_options = 0x0, inp_moptions = 0x0}, {in6p_options = 0x0, in6p_outputopts = 0x0, in6p_moptions = 0x0, in6p_icmp6filt = 0x0, in6p_cksum = 0x0, in6p_hops = 0x0}, inp_portlist = {cle_next = 0xfffff80c25133d90, cle_prev = 0xfffff80c2574e160}, inp_phd = 0xfffff80105bbbf00, inp_gencnt = 0xa07c7c, spare_ptr = 0x0, inp_rt_cookie = 0x63, {inp_route = {ro_nh = 0xfffff8016f136d00, ro_lle = 0xfffff8013c8a2a80, ro_prepend = 0x0, ro_plen = 0x0, ro_flags = 0x180, ro_mtu = 0x0, spare = 0x0, ro_dst = {sa_len = 0x10, sa_family = 0x2, sa_data = {0x0, 0x0, 0xb5, 0x12, 0x29, 0x2f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}}, inp_route6 = {ro_nh = 0xfffff8016f136d00, ro_lle = 0xfffff8013c8a2a80, ro_prepend = 0x0, ro_plen = 0x0, ro_flags = 0x180, ro_mtu = 0x0, spare = 0x0, ro_dst = {sin6_len = 0x10, sin6_family = 0x2, sin6_port = 0x0, sin6_flowinfo = 0x2f2912b5, sin6_addr = {__u6_addr = {__u6_addr8 = {0x0 <repeats 16 times>}, __u6_addr16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, __u6_addr32 = {0x0, 0x0, 0x0, 0x0}}}, sin6_scope_id = 0x0}}}, inp_list = {cle_next = 0xfffff80c2580e000, cle_prev = 0xfffff80c2574e1d0}, inp_epoch_ctx = {data = {0x0, 0x0}}} (kgdb) print /x *(tp->t_inpcb->inp_socket) $6 = {so_lock = {lock_object = {lo_name = 0xffffffff81203282, lo_flags = 0x1430000, lo_data = 0x0, lo_witness = 0xfffff8207fd84300}, mtx_lock = 0x0}, so_count = 0x0, so_rdsel = {si_tdlist = {tqh_first = 0x0, tqh_last = 0x0}, si_note = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff80cd4440, kl_unlock = 0xffffffff80cd4480, kl_assert_lock = 0xffffffff80cd44c0, kl_lockarg = 0xfffff80900858000, kl_autodestroy = 0x0}, si_mtx = 0x0}, so_wrsel = {si_tdlist = {tqh_first = 0x0, tqh_last = 0x0}, si_note = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff80cd4550, kl_unlock = 0xffffffff80cd4590, kl_assert_lock = 0xffffffff80cd45d0, kl_lockarg = 0xfffff80900858000, kl_autodestroy = 0x0}, si_mtx = 0x0}, so_type = 0x1, so_options = 0x10004, so_linger = 0x0, so_state = 0x410b, so_pcb = 0xfffff8090099b1f0, so_vnet = 0xfffff8010582fec0, so_proto = 0xffffffff81b3be40, so_timeo = 0x0, so_error = 0x0, so_rerror = 0x0, so_sigio = 0x0, so_cred = 0xfffff80107538500, so_label = 0x0, so_gencnt = 0xa23ff6, so_emuldata = 0x0, so_dtor = 0x0, osd = { osd_nslots = 0x0, osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}}, so_fibnum = 0x1, so_user_cookie = 0x0, so_ts_clock = 0x0, so_max_pacing_rate = 0x0, {{so_rcv = {sb_mtx = {lock_object = { lo_name = 0xffffffff8127fc0a, lo_flags = 0x1030000, lo_data = 0x0, lo_witness = 0xfffff8207fd74800}, mtx_lock = 0x0}, sb_sx = {lock_object = {lo_name = 0xffffffff812debe2, lo_flags = 0x2330000, lo_data = 0x0, lo_witness = 0xfffff8207fd84400}, sx_lock = 0x1}, sb_sel = 0xfffff80900858028, sb_state = 0x20, sb_mb = 0x0, sb_mbtail = 0x0, sb_lastrecord = 0x0, sb_sndptr = 0x0, sb_fnrdy = 0x0, sb_sndptroff = 0x0, sb_acc = 0x0, sb_ccc = 0x0, sb_hiwat = 0x10108, sb_mbcnt = 0x0, sb_mcnt = 0x0, sb_ccnt = 0x0, sb_mbmax = 0x80840, sb_ctl = 0x0, sb_tlscc = 0x0, sb_tlsdcc = 0x0, sb_lowat = 0x1, sb_timeo = 0x0, sb_tls_seqno = 0x0, sb_tls_info = 0x0, sb_mtls = 0x0, sb_mtlstail = 0x0, sb_flags = 0x800, sb_upcall = 0x0, sb_upcallarg = 0x0, sb_aiojobq = {tqh_first = 0x0, tqh_last = 0xfffff80900858230}, sb_aiotask = {ta_link = {stqe_next = 0x0}, ta_pending = 0x0, ta_priority = 0x0, ta_flags = 0x0, ta_func = 0xffffffff80caceb0, ta_context = 0xfffff80900858000}}, so_snd = {sb_mtx = {lock_object = {lo_name = 0xffffffff81296bb1, lo_flags = 0x1030000, lo_data = 0x0, lo_witness = 0xfffff8207fd74780}, mtx_lock = 0xfffffe017fed7720}, sb_sx = {lock_object = {lo_name = 0xffffffff8130e57d, lo_flags = 0x2330000, lo_data = 0x0, lo_witness = 0xfffff8207fd84380}, sx_lock = 0x1}, sb_sel = 0xfffff80900858070, sb_state = 0x10, sb_mb = 0xfffff80bc680b500, sb_mbtail = 0xfffff80bc680b500, sb_lastrecord = 0xfffff80bc680b500, sb_sndptr = 0xfffff80bc680b500, sb_fnrdy = 0x0, sb_sndptroff = 0x0, sb_acc = 0x7cf, sb_ccc = 0x7cf, sb_hiwat = 0x10108, sb_mbcnt = 0x900, sb_mcnt = 0x1, sb_ccnt = 0x1, sb_mbmax = 0x80840, sb_ctl = 0x0, sb_tlscc = 0x0, sb_tlsdcc = 0x0, sb_lowat = 0x800, sb_timeo = 0x0, sb_tls_seqno = 0x0, sb_tls_info = 0x0, sb_mtls = 0x0, sb_mtlstail = 0x0, sb_flags = 0x800, sb_upcall = 0x0, sb_upcallarg = 0x0, sb_aiojobq = {tqh_first = 0x0, tqh_last = 0xfffff80900858348}, sb_aiotask = {ta_link = {stqe_next = 0x0}, ta_pending = 0x0, ta_priority = 0x0, ta_flags = 0x0, ta_func = 0xffffffff80cad6f0, ta_context = 0xfffff80900858000}}, so_list = {tqe_next = 0xffffffffffffffff, tqe_prev = 0xffffffffffffffff}, so_listen = 0x0, so_qstate = 0x0, so_peerlabel = 0x0, so_oobmark = 0x0, so_ktls_rx_list = {stqe_next = 0x0}}, {sol_incomp = {tqh_first = 0xffffffff8127fc0a, tqh_last = 0x1030000}, sol_comp = {tqh_first = 0xfffff8207fd74800, tqh_last = 0x0}, sol_qlen = 0x812debe2, sol_incqlen = 0xffffffff, sol_qlimit = 0x2330000, sol_accept_filter = 0xfffff8207fd84400, sol_accept_filter_arg = 0x1, sol_accept_filter_str = 0xfffff80900858028, sol_upcall = 0x20, sol_upcallarg = 0x0, sol_sbrcv_lowat = 0x0, sol_sbsnd_lowat = 0x0, sol_sbrcv_hiwat = 0x0, sol_sbsnd_hiwat = 0x0, sol_sbrcv_flags = 0x0, sol_sbsnd_flags = 0x0, sol_sbrcv_timeo = 0x0, sol_sbsnd_timeo = 0x0, sol_lastover = {tv_sec = 0x1010800000000, tv_usec = 0x0}, sol_overcount = 0x0}}} (kgdb)
And also: frame 14 print /x *(tp->t_inpcb->inp_socket->so_snd.sb_mb) This dumps the faulty mbuf. I see that so_snd reports bytes available, let's see if that matches the mbuf: sb_acc = 0x7cf, sb_ccc = 0x7cf, sb_mcnt = 0x1, --HPS
(In reply to Hans Petter Selasky from comment #72) (kgdb) frame 14 #14 0xffffffff80e0607a in tcp_do_segment (m=<optimized out>, th=0xfffff80bc659e87a, so=<optimized out>, tp=0xfffffe0252e24000, drop_hdrlen=40, tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2822 2822 tcp_sack_partialack(tp, th); (kgdb) print /x *(tp->t_inpcb->inp_socket->so_snd.sb_mb) $7 = {{m_next = 0x0, m_slist = {sle_next = 0x0}, m_stailq = {stqe_next = 0x0}}, {m_nextpkt = 0x0, m_slistpkt = {sle_next = 0x0}, m_stailqpkt = {stqe_next = 0x0}}, m_data = 0xfffff8017874f000, m_len = 0x7cf, m_type = 0x1, m_flags = 0x1, {{{m_pkthdr = {{snd_tag = 0x0, rcvif = 0x0}, tags = {slh_first = 0x0}, len = 0x511, flowid = 0x0, csum_flags = 0x0, fibnum = 0x0, numa_domain = 0xff, rsstype = 0x0, {rcv_tstmp = 0x0, {l2hlen = 0x0, l3hlen = 0x0, l4hlen = 0x0, l5hlen = 0x0, inner_l2hlen = 0x0, inner_l3hlen = 0x0, inner_l4hlen = 0x0, inner_l5hlen = 0x0}}, PH_per = {eight = {0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0x0, 0x0}, sixteen = {0x0, 0x0, 0xffff, 0x0}, thirtytwo = {0x0, 0xffff}, sixtyfour = {0xffff00000000}, unintptr = {0xffff00000000}, ptr = 0xffff00000000}, PH_loc = {eight = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, sixteen = {0x0, 0x0, 0x0, 0x0}, thirtytwo = {0x0, 0x0}, sixtyfour = {0x0}, unintptr = {0x0}, ptr = 0x0}}, {m_epg_npgs = 0x0, m_epg_nrdy = 0x0, m_epg_hdrlen = 0x0, m_epg_trllen = 0x0, m_epg_1st_off = 0x0, m_epg_last_len = 0x0, m_epg_flags = 0x0, m_epg_record_type = 0x0, __spare = {0x0, 0x0}, m_epg_enc_cnt = 0x0, m_epg_tls = 0x511, m_epg_so = 0xff000000000000, m_epg_seqno = 0x0, m_epg_stailq = {stqe_next = 0xffff00000000}}}, {m_ext = {{ ext_count = 0x2, ext_cnt = 0xdeadc0de00000002}, ext_size = 0x800, ext_type = 0x6, ext_flags = 0x1, {{ext_buf = 0xfffff8017874f000, ext_arg2 = 0x0}, {extpg_pa = {0xfffff8017874f000, 0x0, 0xdeadc0dedeadc0de, 0xdeadc0dedeadc0de, 0xdeadc0dedeadc0de}, extpg_trail = {0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde}, extpg_hdr = {0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad}}}, ext_free = 0x0, ext_arg1 = 0x0}, m_pktdat = 0xfffff80bc680b558}}, m_dat = 0xfffff80bc680b520}} (kgdb)
And this one: print /x *(int *)0xfffffe017fd6282c --HPS
(In reply to Hans Petter Selasky from comment #74) From frame 14: (kgdb) print /x *(int *)0xfffffe017fd6282c $8 = 0x264 (kgdb)
Anything more from the dump, or, potential fix to test on the server?
(In reply to Dobri Dobrev from comment #76) Not right now from my side. Will setup a local test system an explore a couple of packetflows...
(In reply to Michael Tuexen from comment #77) Any news?
(In reply to Dobri Dobrev from comment #78) Not yet. I'll bring it up at the transport telco on next Thursday.
(In reply to Michael Tuexen from comment #79) Btw, I moved over to -release, no issues there.
Are you loading any modules from /boot/modules, typically installed from ports, which access the network? Just curious.
(In reply to Hans Petter Selasky from comment #81) Here are the modules loaded on both Stable and Release: 13-Stable: Id Refs Address Size Name 1 36 0xffffffff80200000 1f2a788 kernel 2 3 0xffffffff8212b000 3cb0 smbus.ko 3 1 0xffffffff82130000 2870 accf_data.ko 4 1 0xffffffff82133000 2e88 accf_http.ko 5 1 0xffffffff82136000 ab70 opensolaris.ko 6 1 0xffffffff82141000 11578 ipmi.ko 7 1 0xffffffff82153000 734d0 pf.ko 8 1 0xffffffff821c7000 77650 if_igb.ko 9 1 0xffffffff826e5000 3378 acpi_wmi.ko 10 1 0xffffffff826e9000 3218 intpm.ko 11 1 0xffffffff826ed000 64b8 if_gre.ko 12 1 0xffffffff826f4000 3530 fdescfs.ko 13-Release: Id Refs Address Size Name 1 36 0xffffffff80200000 1f11c18 kernel 2 3 0xffffffff82112000 44f0 smbus.ko 3 1 0xffffffff82117000 12a80 ipmi.ko 4 1 0xffffffff8212a000 79c70 if_igb.ko 5 1 0xffffffff821a4000 b7b8 opensolaris.ko 6 1 0xffffffff821b0000 32b0 accf_http.ko 7 1 0xffffffff821b4000 5c3b0 pf.ko 8 1 0xffffffff82211000 2b58 accf_data.ko 9 1 0xffffffff826e5000 3378 acpi_wmi.ko 10 1 0xffffffff826e9000 3218 intpm.ko 11 1 0xffffffff826ed000 64b8 if_gre.ko 12 1 0xffffffff826f4000 3530 fdescfs.ko
Dobri, kernel crash minidumps (this is what we do by default) will not have any data for userland pages. So, no proprietary binaries running would be leaked, if you share the core. What could leak is the data that was flowing through the network stack at the moment of crash. Maybe sharing cores with Michael will make things faster.
(In reply to Gleb Smirnoff from comment #83) From what I remember the crash dump files were ~3.5-3.6 GB. Does that correspond to a minidump crash size?
(In reply to Gleb Smirnoff from comment #83) Also, how can I check to confirm no actual binaries don't exist in the dump?
Yes, full dump would be exactly size of your physical RAM. There are two ways two check: search the dump file for a sample of the data you are concerned to leak, or open the dump in kgdb, find the process you are interested in, try to read pages that belong to it and make sure that kgdb fails to read them.
(In reply to Gleb Smirnoff from comment #86) The problem is ... I switched to -release. On -stable it crashed very frequently resulting in disruptions of customer traffic, which is something I'd like to avoid. Right now I don't have a server running stable. Were you not able to reproduce the issue on your setups? Nginx with http/https traffic ~400-1200 rps throughout the day resulted in crashes varying from once every hour to once every 2-4 days.
(In reply to Dobri Dobrev from comment #87) I tried to trigger a panic on a local system using packetdrill scripts, but were not able to to this. So without being able to reproduce this in a system I can access, it is hard to figure out what is going on...
(In reply to Michael Tuexen from comment #88) I don't suppose there's anything new regarding this issue? I could do a separate 13-stable install, place some traffic and see if it'd crash again, however, I need to be *sure* the dump won't contain any running binaries, ssl cert-key pairs, etc. Let me know how.
(In reply to Dobri Dobrev from comment #89) I would suggest the following: 1. During this week I will MFC all TCP related changes to stable/13, which I think should go into 13.1. I'll ping you when it is done. 2. You update to that state and do your testing. That way we can test what will be in 13.1. I'm interested that that version does not have the issue you were experiencing. If the problem still persists, having access to the core does help. According the Gleb, no critical information should be in there. If you prefer, you can give me access to a machine under your control having the core file and you can monitor any command I'm running. If the above two steps are fine with you, that would be great. Please let me know.
(In reply to Dobri Dobrev from comment #89) OK, I MFCed all relevant changes I wanted to MFC. It would be great if you could update a machine to stable/13, test it, and report the outcome of the test.
(In reply to Michael Tuexen from comment #91) I hadn't had a chance to update to the latest stable/13. Just got a crash after 112 days uptime on stable/13-n248590-b7da472979a Here's what kgdb shows: Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... Unread portion of the kernel message buffer: [9705874] panic: page fault [9705874] cpuid = 5 [9705874] time = 1650110040 [9705874] KDB: stack backtrace: [9705874] #0 0xffffffff80c60dd5 at kdb_backtrace+0x65 [9705874] #1 0xffffffff80c1336f at vpanic+0x17f [9705874] #2 0xffffffff80c131e3 at panic+0x43 [9705874] #3 0xffffffff810991b5 at trap_fatal+0x385 [9705874] #4 0xffffffff8109920f at trap_pfault+0x4f [9705874] #5 0xffffffff810705e8 at calltrap+0x8 [9705874] #6 0xffffffff80dd5fa9 at tcp_output+0x1339 [9705874] #7 0xffffffff80dcd382 at tcp_do_segment+0x2902 [9705874] #8 0xffffffff80dc9d41 at tcp_input_with_port+0xb61 [9705874] #9 0xffffffff80dca9eb at tcp_input+0xb [9705874] #10 0xffffffff80dbc1bf at ip_input+0x11f [9705874] #11 0xffffffff80d491a9 at netisr_dispatch_src+0xb9 [9705874] #12 0xffffffff80d2d128 at ether_demux+0x138 [9705874] #13 0xffffffff80d2e4b5 at ether_nh_input+0x355 [9705874] #14 0xffffffff80d491a9 at netisr_dispatch_src+0xb9 [9705874] #15 0xffffffff80d2d559 at ether_input+0x69 [9705874] #16 0xffffffff80d45617 at iflib_rxeof+0xc27 [9705874] #17 0xffffffff80d3fc62 at _task_fn_rx+0x72 [9705874] Uptime: 112d8h4m34s [9705874] Dumping 11660 out of 65425 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) where #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0xffffffff80c12f6c in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487 #3 0xffffffff80c133de in vpanic (fmt=0xffffffff81191bdd "%s", ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:920 #4 0xffffffff80c131e3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:844 #5 0xffffffff810991b5 in trap_fatal (frame=0xfffffe0069f535b0, eva=24) at /usr/src/sys/amd64/amd64/trap.c:944 #6 0xffffffff8109920f in trap_pfault (frame=0xfffffe0069f535b0, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:763 #7 <signal handler called> #8 m_copydata (m=0x0, m@entry=0xfffff801e9cc5b00, off=0, len=1, cp=<optimized out>) at /usr/src/sys/kern/uipc_mbuf.c:657 #9 0xffffffff80dd5fa9 in tcp_output (tp=<optimized out>) at /usr/src/sys/netinet/tcp_output.c:1081 #10 0xffffffff80dcd382 in tcp_do_segment (m=<optimized out>, th=<optimized out>, so=<optimized out>, tp=0xfffffe01a0d0b870, drop_hdrlen=52, tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2822 #11 0xffffffff80dc9d41 in tcp_input_with_port (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>, port=port@entry=0) at /usr/src/sys/netinet/tcp_input.c:1400 #12 0xffffffff80dca9eb in tcp_input (mp=0xfffff801e9cc5b00, offp=0x0, proto=1) at /usr/src/sys/netinet/tcp_input.c:1496 #13 0xffffffff80dbc1bf in ip_input (m=0x0) at /usr/src/sys/netinet/ip_input.c:834 #14 0xffffffff80d491a9 in netisr_dispatch_src (proto=1, source=source@entry=0, m=0xfffff8002c41c400) at /usr/src/sys/net/netisr.c:1143 #15 0xffffffff80d4957f in netisr_dispatch (proto=3922483968, m=0x1) at /usr/src/sys/net/netisr.c:1234 #16 0xffffffff80d2d128 in ether_demux (ifp=ifp@entry=0xfffff80001ed8000, m=0x0) at /usr/src/sys/net/if_ethersubr.c:921 #17 0xffffffff80d2e4b5 in ether_input_internal (ifp=0xfffff80001ed8000, m=0x0) at /usr/src/sys/net/if_ethersubr.c:707 #18 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:737 #19 0xffffffff80d491a9 in netisr_dispatch_src (proto=proto@entry=5, source=source@entry=0, m=m@entry=0xfffff8002c41c400) at /usr/src/sys/net/netisr.c:1143 #20 0xffffffff80d4957f in netisr_dispatch (proto=3922483968, proto@entry=5, m=0x1, m@entry=0xfffff8002c41c400) at /usr/src/sys/net/netisr.c:1234 #21 0xffffffff80d2d559 in ether_input (ifp=<optimized out>, m=0xfffff8002c41c400) at /usr/src/sys/net/if_ethersubr.c:828 #22 0xffffffff80d45617 in iflib_rxeof (rxq=<optimized out>, rxq@entry=0xfffffe0114b00040, budget=<optimized out>) at /usr/src/sys/net/iflib.c:3046 #23 0xffffffff80d3fc62 in _task_fn_rx (context=0xfffffe0114b00040) at /usr/src/sys/net/iflib.c:3989 #24 0xffffffff80c5f80d in gtaskqueue_run_locked (queue=queue@entry=0xfffff80001d68800) at /usr/src/sys/kern/subr_gtaskqueue.c:371 #25 0xffffffff80c5f482 in gtaskqueue_thread_loop (arg=<optimized out>, arg@entry=0xfffffe0114a7b080) at /usr/src/sys/kern/subr_gtaskqueue.c:547 #26 0xffffffff80bd053e in fork_exit (callout=0xffffffff80c5f3c0 <gtaskqueue_thread_loop>, arg=0xfffffe0114a7b080, frame=0xfffffe0069f53f40) at /usr/src/sys/kern/kern_fork.c:1092 #27 <signal handler called> #28 mi_startup () at /usr/src/sys/kern/init_main.c:322 Backtrace stopped: Cannot access memory at address 0x17 (kgdb) Let me know if I should update to latest stable/13, or if you'd want to examine the crashdump the same way we did before - you tell me what you need, I do it and provide it here. Regards, D
The current thinking is, that SACK rescue retransmissions (in FBSD13 this is gated by net.inet.tcp.rfc6675_pipe=1) very rarely creates an entry, which apparently is beyond the valid data range. While under most common circumstances, a final FIN bit in the sequence space is taken care of, it seems that there may be some double-counting for the FIN bit. In most of the inspected cores, we found: TCP state: LAST_ACK (FIN received and also FIN sent) SACK loss recovery triggered A cumulative ACK before all outstanding data was received The remote cliet "disappears" for a significant amount of time (7 to 12 retransmission timeouts), but may re-appear again just prior. snd_max consistently 2 counts above the last data, instead of the expected 1 (for the FIN bit). However, it is still unclear under what circumstances this double-counting happens, possibly when the persist timer triggers, and a few other conditions are also fulfilled - maybe a race condition between normal packet processing and a timer firing. In short: disabling rfc6675 enhanced SACK features (more correct pipeline accounting, rescue retransmissions) should address the cause of the panic, while not addressing the root cause of when/why there is the double-accounting of the FIN bit... Would you be willing to run an intrumented kernel, which either panics (full core dump), or spews out various state, when inconsistencies are detected in this space - while ignoring/addressing them "on the fly" without panicing?
Just got a crash on 13.1 -- stable/13-n252201 And this is with net.inet.tcp.rfc6675_pipe=0 Here's kgdb: # kgdb /boot/kernel/kernel /var/crash/vmcore.4 GNU gdb (GDB) 11.1 [GDB v11.1 for FreeBSD] Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd13.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... Unread portion of the kernel message buffer: [91] frame pointer = 0x28:0xfffffe0069f536e0 [91] code segment = base rx0, limit 0xfffff, type 0x1b [91] = DPL 0, pres 1, long 1, def32 0, gran 1 [92] processor eflags = interrupt enabled, resume, IOPL = 0 [92] current process = 0 (if_io_tqg_5) [92] trap number = 12 [92] panic: page fault [92] cpuid = 5 [92] time = 1661715643 [92] KDB: stack backtrace: [92] #0 0xffffffff80c50045 at kdb_backtrace+0x65 [92] #1 0xffffffff80c02e81 at vpanic+0x151 [92] #2 0xffffffff80c02d23 at panic+0x43 [92] #3 0xffffffff8109fd57 at trap_fatal+0x387 [92] #4 0xffffffff8109fdaf at trap_pfault+0x4f [92] #5 0xffffffff81077288 at calltrap+0x8 [92] #6 0xffffffff80dc7699 at tcp_output+0x1339 [92] #7 0xffffffff80dbedab at tcp_do_segment+0x2c9b [92] #8 0xffffffff80dbb3e1 at tcp_input_with_port+0xb61 [92] #9 0xffffffff80dbc07b at tcp_input+0xb [92] #10 0xffffffff80dad8f8 at ip_input+0x118 [92] #11 0xffffffff80d3a729 at netisr_dispatch_src+0xb9 [92] #12 0xffffffff80d1e974 at ether_demux+0x144 [92] #13 0xffffffff80d1fcd6 at ether_nh_input+0x346 [92] #14 0xffffffff80d3a729 at netisr_dispatch_src+0xb9 [92] #15 0xffffffff80d1ed99 at ether_input+0x69 [92] #16 0xffffffff80d36c3b at iflib_rxeof+0xbcb [92] #17 0xffffffff80d314c2 at _task_fn_rx+0x72 [92] Uptime: 1m32s [92] Dumping 2355 out of 65425 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) where #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 dump_savectx () at /usr/src/sys/kern/kern_shutdown.c:394 #2 0xffffffff80c02a78 in dumpsys (di=0x0) at /usr/src/sys/x86/include/dump.h:87 #3 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:423 #4 kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:497 #5 0xffffffff80c02eee in vpanic (fmt=<optimized out>, ap=ap@entry=0xfffffe0069f534c0) at /usr/src/sys/kern/kern_shutdown.c:930 #6 0xffffffff80c02d23 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:854 #7 0xffffffff8109fd57 in trap_fatal (frame=0xfffffe0069f535b0, eva=24) at /usr/src/sys/amd64/amd64/trap.c:940 #8 0xffffffff8109fdaf in trap_pfault (frame=0xfffffe0069f535b0, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:759 #9 <signal handler called> #10 m_copydata (m=0x0, m@entry=0xfffff8000dc30e00, off=0, len=1, cp=<optimized out>) at /usr/src/sys/kern/uipc_mbuf.c:659 #11 0xffffffff80dc7699 in tcp_output (tp=0xfffffe019e765950) at /usr/src/sys/netinet/tcp_output.c:1084 #12 0xffffffff80dbedab in tcp_do_segment (m=0xfffff8002ad7e100, th=0xfffff8002ad7e17a, so=0xfffff801cb635000, tp=0xfffffe019e765950, drop_hdrlen=64, tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2822 #13 0xffffffff80dbb3e1 in tcp_input_with_port (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>, port=port@entry=0) at /usr/src/sys/netinet/tcp_input.c:1400 #14 0xffffffff80dbc07b in tcp_input (mp=0xfffff8000dc30e00, offp=0x0, proto=1) at /usr/src/sys/netinet/tcp_input.c:1496 #15 0xffffffff80dad8f8 in ip_input (m=0x0) at /usr/src/sys/netinet/ip_input.c:839 #16 0xffffffff80d3a729 in netisr_dispatch_src (proto=1, source=source@entry=0, m=0xfffff8002ad7e100) at /usr/src/sys/net/netisr.c:1143 #17 0xffffffff80d3aaff in netisr_dispatch (proto=230886912, m=0x1) at /usr/src/sys/net/netisr.c:1234 #18 0xffffffff80d1e974 in ether_demux (ifp=ifp@entry=0xfffff800023a6800, m=0x0) at /usr/src/sys/net/if_ethersubr.c:921 #19 0xffffffff80d1fcd6 in ether_input_internal (ifp=0xfffff800023a6800, m=0x0) at /usr/src/sys/net/if_ethersubr.c:707 #20 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:737 #21 0xffffffff80d3a729 in netisr_dispatch_src (proto=proto@entry=5, source=source@entry=0, m=m@entry=0xfffff8002ad7e100) at /usr/src/sys/net/netisr.c:1143 #22 0xffffffff80d3aaff in netisr_dispatch (proto=230886912, proto@entry=5, m=0x1, m@entry=0xfffff8002ad7e100) at /usr/src/sys/net/netisr.c:1234 #23 0xffffffff80d1ed99 in ether_input (ifp=<optimized out>, m=0xfffff8002ad7e100) at /usr/src/sys/net/if_ethersubr.c:828 #24 0xffffffff80d36c3b in iflib_rxeof (rxq=rxq@entry=0xfffffe0114b0f040, budget=<optimized out>) at /usr/src/sys/net/iflib.c:3046 #25 0xffffffff80d314c2 in _task_fn_rx (context=0xfffffe0114b0f040) at /usr/src/sys/net/iflib.c:3989 #26 0xffffffff80c4ea5d in gtaskqueue_run_locked (queue=queue@entry=0xfffff80001d6b800) at /usr/src/sys/kern/subr_gtaskqueue.c:371 #27 0xffffffff80c4e6c3 in gtaskqueue_thread_loop (arg=arg@entry=0xfffffe0114a7f080) at /usr/src/sys/kern/subr_gtaskqueue.c:547 #28 0xffffffff80bbfafe in fork_exit (callout=0xffffffff80c4e600 <gtaskqueue_thread_loop>, arg=0xfffffe0114a7f080, frame=0xfffffe0069f53f40) at /usr/src/sys/kern/kern_fork.c:1103 #29 <signal handler called> #30 mi_startup () at /usr/src/sys/kern/init_main.c:322 Backtrace stopped: Cannot access memory at address 0x17 (kgdb)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=6d9e911fbadf3b409802a211c1dae9b47cb5a2b8 commit 6d9e911fbadf3b409802a211c1dae9b47cb5a2b8 Author: Michael Tuexen <tuexen@FreeBSD.org> AuthorDate: 2022-09-19 10:42:43 +0000 Commit: Michael Tuexen <tuexen@FreeBSD.org> CommitDate: 2022-09-19 10:49:31 +0000 tcp: fix computation of offset Only update the offset if actually retransmitting from the scoreboard. If not done correctly, this may result in trying to (re)-transmit data not being being in the socket buffe and therefore resulting in a panic. PR: 264257 PR: 263445 PR: 260393 Reviewed by: rscheff@ MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D36626 sys/netinet/tcp_output.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=5ae83e0d871bc7cbe4dcc9a33d37eb689e631efe commit 5ae83e0d871bc7cbe4dcc9a33d37eb689e631efe Author: Michael Tuexen <tuexen@FreeBSD.org> AuthorDate: 2022-09-22 10:12:11 +0000 Commit: Michael Tuexen <tuexen@FreeBSD.org> CommitDate: 2022-09-22 10:12:11 +0000 tcp: send ACKs when requested When doing Limited Transmit send an ACK when needed by the protocol processing (like sending ACKs with a DSACK block). PR: 264257 PR: 263445 PR: 260393 Reviewed by: rscheff@ MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D36631 sys/netinet/tcp_input.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=a743fc8826fa348b09d219632594c537f8e5690e commit a743fc8826fa348b09d219632594c537f8e5690e Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2022-09-22 10:55:25 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2022-09-22 11:28:43 +0000 tcp: fix cwnd restricted SACK retransmission loop While doing the initial SACK retransmission segment while heavily cwnd constrained, tcp_ouput can erroneously send out the entire sendbuffer again. This may happen after an retransmission timeout, which resets snd_nxt to snd_una while the SACK scoreboard is still populated. Reviewed By: tuexen, #transport PR: 264257 PR: 263445 PR: 260393 MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D36637 sys/netinet/tcp_output.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=3651c4f42285644938e2f5bc924ab8c7ed857f83 commit 3651c4f42285644938e2f5bc924ab8c7ed857f83 Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2022-09-22 10:55:25 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2022-09-25 08:52:56 +0000 tcp: fix cwnd restricted SACK retransmission loop While doing the initial SACK retransmission segment while heavily cwnd constrained, tcp_ouput can erroneously send out the entire sendbuffer again. This may happen after an retransmission timeout, which resets snd_nxt to snd_una while the SACK scoreboard is still populated. Reviewed By: tuexen, #transport PR: 264257 PR: 263445 PR: 260393 MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D36637 (cherry picked from commit a743fc8826fa348b09d219632594c537f8e5690e) sys/netinet/tcp_output.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9e69e009c86f259653610f3c337253b79381c7a7 commit 9e69e009c86f259653610f3c337253b79381c7a7 Author: Michael Tuexen <tuexen@FreeBSD.org> AuthorDate: 2022-09-22 10:12:11 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2022-09-25 08:46:54 +0000 tcp: send ACKs when requested When doing Limited Transmit send an ACK when needed by the protocol processing (like sending ACKs with a DSACK block). PR: 264257 PR: 263445 PR: 260393 Reviewed by: rscheff@ MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D36631 (cherry picked from commit 5ae83e0d871bc7cbe4dcc9a33d37eb689e631efe) sys/netinet/tcp_input.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=26370413d43bfd65500270ff331ae6bdf0f54133 commit 26370413d43bfd65500270ff331ae6bdf0f54133 Author: Michael Tuexen <tuexen@FreeBSD.org> AuthorDate: 2022-09-19 10:42:43 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2022-09-25 08:41:54 +0000 tcp: fix computation of offset Only update the offset if actually retransmitting from the scoreboard. If not done correctly, this may result in trying to (re)-transmit data not being being in the socket buffe and therefore resulting in a panic. PR: 264257 PR: 263445 PR: 260393 Reviewed by: rscheff@ MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D36626 (cherry picked from commit 6d9e911fbadf3b409802a211c1dae9b47cb5a2b8) sys/netinet/tcp_output.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=0612d3000b974f31de15c90c77bf43f121fc8656 commit 0612d3000b974f31de15c90c77bf43f121fc8656 Author: Michael Tuexen <tuexen@FreeBSD.org> AuthorDate: 2022-09-19 10:42:43 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2022-09-25 08:54:18 +0000 tcp: fix computation of offset Only update the offset if actually retransmitting from the scoreboard. If not done correctly, this may result in trying to (re)-transmit data not being being in the socket buffe and therefore resulting in a panic. PR: 264257 PR: 263445 PR: 260393 Reviewed by: rscheff@ MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D36626 (cherry picked from commit 6d9e911fbadf3b409802a211c1dae9b47cb5a2b8) sys/netinet/tcp_output.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=f9edad0054652e020b8214f61c0e454fd48101a6 commit f9edad0054652e020b8214f61c0e454fd48101a6 Author: Michael Tuexen <tuexen@FreeBSD.org> AuthorDate: 2022-09-22 10:12:11 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2022-09-25 08:55:41 +0000 tcp: send ACKs when requested When doing Limited Transmit send an ACK when needed by the protocol processing (like sending ACKs with a DSACK block). PR: 264257 PR: 263445 PR: 260393 Reviewed by: rscheff@ MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D36631 (cherry picked from commit 5ae83e0d871bc7cbe4dcc9a33d37eb689e631efe) sys/netinet/tcp_input.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=c1f9a81e7bfe354dfa4f191d5180426f76bc514b commit c1f9a81e7bfe354dfa4f191d5180426f76bc514b Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2022-09-22 10:55:25 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2022-09-25 08:56:28 +0000 tcp: fix cwnd restricted SACK retransmission loop While doing the initial SACK retransmission segment while heavily cwnd constrained, tcp_ouput can erroneously send out the entire sendbuffer again. This may happen after an retransmission timeout, which resets snd_nxt to snd_una while the SACK scoreboard is still populated. Reviewed By: tuexen, #transport PR: 264257 PR: 263445 PR: 260393 MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D36637 (cherry picked from commit a743fc8826fa348b09d219632594c537f8e5690e) sys/netinet/tcp_output.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
I think this issue is fixed. If the problem still exists, please re-open.