During IPSEC regression, got Kernel Panic when ran traffic over tunnel mode and did ipsec restart in parallel Unread portion of the kernel message buffer: [8118.158276] panic: sackhint bytes rtx >= 0 [8118.169026] cpuid = 8 [8118.183306] time = 1702046494 [8118.197593] KDB: stack backtrace: [8118.212342] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01579543b0 [8118.227297] vpanic() at vpanic+0x132/frame 0xfffffe01579544e0 [8118.241552] panic() at panic+0x43/frame 0xfffffe0157954540 [8118.255964] tcp_sack_doack() at tcp_sack_doack+0x83c/frame 0xfffffe01579545d0 [8118.270235] tcp_do_segment() at tcp_do_segment+0x20f2/frame 0xfffffe01579546b0 [8118.284532] tcp_input_with_port() at tcp_input_with_port+0x1157/frame 0xfffffe0157954800 [8118.298806] tcp_input() at tcp_input+0xb/frame 0xfffffe0157954810 [8118.313084] ip_input() at ip_input+0x2ab/frame 0xfffffe0157954870 [8118.327340] netisr_dispatch_src() at netisr_dispatch_src+0xad/frame 0xfffffe01579548d0 [8118.341622] gif_input() at gif_input+0x241/frame 0xfffffe0157954920 [8118.355939] in_gif_input() at in_gif_input+0x74/frame 0xfffffe0157954960 [8118.370225] encap_input() at encap_input+0xd7/frame 0xfffffe01579549d0 [8118.384523] encap4_input() at encap4_input+0x26/frame 0xfffffe0157954a00 [8118.398803] ip_input() at ip_input+0x2ab/frame 0xfffffe0157954a60 [8118.413058] netisr_dispatch_src() at netisr_dispatch_src+0xad/frame 0xfffffe0157954ac0 [8118.427347] ether_demux() at ether_demux+0x17a/frame 0xfffffe0157954af0 [8118.441647] ether_nh_input() at ether_nh_input+0x39f/frame 0xfffffe0157954b40 [8118.455928] netisr_dispatch_src() at netisr_dispatch_src+0xad/frame 0xfffffe0157954ba0 [8118.470195] ether_input() at ether_input+0xd9/frame 0xfffffe0157954c00 [8118.484537] tcp_lro_flush_all() at tcp_lro_flush_all+0xdc/frame 0xfffffe0157954c50 [8118.498580] mlx5e_rx_cq_comp() at mlx5e_rx_cq_comp+0x1241/frame 0xfffffe0157954d70 [8118.512841] mlx5_cq_completion() at mlx5_cq_completion+0x74/frame 0xfffffe0157954dd0 [8118.527130] mlx5_eq_int() at mlx5_eq_int+0x28a/frame 0xfffffe0157954e30 [8118.541408] mlx5_msix_handler() at mlx5_msix_handler+0x15/frame 0xfffffe0157954e40 [8118.555698] lkpi_irq_handler() at lkpi_irq_handler+0x29/frame 0xfffffe0157954e60 [8118.570117] ithread_loop() at ithread_loop+0x266/frame 0xfffffe0157954ef0 [8118.584389] fork_exit() at fork_exit+0x82/frame 0xfffffe0157954f30 [8118.598289] fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0157954f30 [8118.611898] --- trap 0, rip = 0, rsp = 0, rbp = 0 --- [8118.626197] Uptime: 2h15m18s [8118.663701] Dumping 1731 out of 32701 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/kernel_git/sys/amd64/include/pcpu_aux.h:57 57 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/kernel_git/sys/amd64/include/pcpu_aux.h:57 #1 doadump (textdump=textdump@entry=1) at /usr/kernel_git/sys/kern/kern_shutdown.c:403 #2 0xffffffff80b3b390 in kern_reboot (howto=260) at /usr/kernel_git/sys/kern/kern_shutdown.c:521 #3 0xffffffff80b3b88f in vpanic (fmt=0xffffffff8114668e "sackhint bytes rtx >= 0", ap=ap@entry=0xfffffe0157954520) at /usr/kernel_git/sys/kern/kern_shutdown.c:965 #4 0xffffffff80b3b633 in panic (fmt=<unavailable>) at /usr/kernel_git/sys/kern/kern_shutdown.c:889 #5 0xffffffff80d3ba6c in tcp_sack_doack (tp=<optimized out>, to=0xfffffe01579545e8, th_ack=<optimized out>) at /usr/kernel_git/sys/netinet/tcp_sack.c:779 #6 0xffffffff80d2cbe2 in tcp_do_segment (tp=0xfffff8022bcf0000, tp@entry=<error reading variable: value is not available>, m=0xfffff80368fe5000, m@entry=<error reading variable: value is not available>, th=0xfffff80368fe5098, th@entry=<error reading variable: value is not available>, drop_hdrlen=64, drop_hdrlen@entry=<error reading variable: value is not available>, tlen=<optimized out>, tlen@entry=<error reading variable: value is not available>, iptos=<unavailable>, iptos@entry=<error reading variable: value is not available>) at /usr/kernel_git/sys/netinet/tcp_input.c:2501 #7 0xffffffff80d2a0a7 in tcp_input_with_port (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>, port=port@entry=0) at /usr/kernel_git/sys/netinet/tcp_input.c:1160 #8 0xffffffff80d2aadb in tcp_input (mp=<unavailable>, offp=<unavailable>, proto=<unavailable>) at /usr/kernel_git/sys/netinet/tcp_input.c:1492 #9 0xffffffff80d1888b in ip_input (m=0x0, m@entry=<error reading variable: value is not available>) at /usr/kernel_git/sys/netinet/ip_input.c:861 #10 0xffffffff80c9393d in netisr_dispatch_src (proto=proto@entry=1, source=source@entry=0, m=0xfffff80368fe5000) at /usr/kernel_git/sys/net/netisr.c:1152 #11 0xffffffff80c93c7f in netisr_dispatch (proto=<unavailable>, proto@entry=1, m=<unavailable>) at /usr/kernel_git/sys/net/netisr.c:1243 #12 0xffffffff80c77da1 in gif_input (m=<unavailable>, m@entry=0xfffff80368fe5000, ifp=ifp@entry=0xfffff8000b26c800, proto=proto@entry=4, ecn=<optimized out>, ecn@entry=0 '\000') at /usr/kernel_git/sys/net/if_gif.c:587 #13 0xffffffff80d092c4 in in_gif_input (m=0xfffff80368fe5000, off=20, proto=4, arg=<optimized out>) at /usr/kernel_git/sys/netinet/in_gif.c:315 #14 0xffffffff80d15717 in encap_input (head=<optimized out>, m=0xfffff80368fe5000, off=20, proto=proto@entry=4) at /usr/kernel_git/sys/netinet/ip_encap.c:328 #15 0xffffffff80d15606 in encap4_input (mp=0xfffffe0157954a30, offp=0xfffffe0157954a2c, proto=4) at /usr/kernel_git/sys/netinet/ip_encap.c:370 #16 0xffffffff80d1888b in ip_input (m=0xfffff80368fe5000, m@entry=<error reading variable: value is not available>) at /usr/kernel_git/sys/netinet/ip_input.c:861 #17 0xffffffff80c9393d in netisr_dispatch_src (proto=proto@entry=1, source=source@entry=0, m=0xfffff80368fe5000) at /usr/kernel_git/sys/net/netisr.c:1152 #18 0xffffffff80c93c7f in netisr_dispatch (proto=<unavailable>, proto@entry=1, m=<unavailable>) at /usr/kernel_git/sys/net/netisr.c:1243 #19 0xffffffff80c7610a in ether_demux (ifp=ifp@entry=0xfffff80112341800, m=<unavailable>) at /usr/kernel_git/sys/net/if_ethersubr.c:952 #20 0xffffffff80c7779f in ether_input_internal (ifp=0xfffff80112341800, m=<unavailable>) at /usr/kernel_git/sys/net/if_ethersubr.c:716 #21 ether_nh_input (m=<unavailable>, m@entry=<error reading variable: value is not available>) at /usr/kernel_git/sys/net/if_ethersubr.c:746 #22 0xffffffff80c9393d in netisr_dispatch_src (proto=proto@entry=5, source=source@entry=0, m=m@entry=0xfffff80368fe5000) at /usr/kernel_git/sys/net/netisr.c:1152 #23 0xffffffff80c93c7f in netisr_dispatch (proto=<unavailable>, proto@entry=5, m=<unavailable>, m@entry=0xfffff80368fe5000) at /usr/kernel_git/sys/net/netisr.c:1243 #24 0xffffffff80c76569 in ether_input (ifp=0xfffff80112341800, ifp@entry=<error reading variable: value is not available>, m=0xfffff80368fe5000, m@entry=<error reading variable: value is not available>) at /usr/kernel_git/sys/net/if_ethersubr.c:857 #25 0xffffffff80d33f4c in tcp_lro_flush_all (lc=0xfffffe0158ae5dc0) at /usr/kernel_git/sys/netinet/tcp_lro.c:1255 #26 0xffffffff82778971 in mlx5_cq_arm (cq=0xfffffe0158ae5ce0, cmd=0, uar_page=<unavailable>, doorbell_lock=0x0, cons_index=<optimized out>) --Type <RET> for more, q to quit, c to continue without paging-- at ../../dev/mlx5/cq.h:149 #27 mlx5e_cq_arm (cq=0xfffffe0158ae5cc0, dblock=0x0) at ../../dev/mlx5/mlx5_en/en.h:1273 #28 mlx5e_rx_cq_comp (mcq=<optimized out>, eqe=<optimized out>) at /tmp/ofed_freebsd_3_0/sys/modules/mlx5en/../../dev/mlx5/mlx5_en/mlx5_en_rx.c:672 #29 0xffffffff826a80a4 in mlx5_cq_completion (dev=0xfffffe01581e6000, eqe=0xfffffe0158316bc0) at /tmp/ofed_freebsd_3_0/sys/modules/mlx5/../../dev/mlx5/mlx5_core/mlx5_cq.c:73 #30 0xffffffff826ab2ba in mlx5_eq_int (dev=0xfffffe01581e6000, eq=0xfffff800056a5700) at /tmp/ofed_freebsd_3_0/sys/modules/mlx5/../../dev/mlx5/mlx5_core/mlx5_eq.c:275 #31 0xffffffff826aab35 in kfree (ptr=0xfffff80003c2de00) at ../../compat/linuxkpi/common/include/linux/slab.h:201 #32 mlx5_create_map_eq (dev=0xfffff8004e10e740, eq=0xfffff80003c2de58, vecidx=<optimized out>, nent=<optimized out>, mask=8590000128) at /tmp/ofed_freebsd_3_0/sys/modules/mlx5/../../dev/mlx5/mlx5_core/mlx5_eq.c:498 #33 0xffffffff826f3a99 in ?? () at /tmp/ofed_freebsd_3_0/sys/modules/linuxkpi/../../compat/linuxkpi/common/src/linux_interrupt.c:110 from /boot/kernel/linuxkpi.ko #34 0x0000000000000000 in ?? () (kgdb) PANIC: r-freeb08.mtr.labs.mlnx dumped core - see /var/crash/vmcore.6 Fri Dec 8 16:44:17 IST 2023 FreeBSD r-freeb08.mtr.labs.mlnx 15.0-CURRENT FreeBSD 15.0-CURRENT #142 ipsec-n266961-b5163c1700a8: Fri Dec 8 14:14:09 IST 2023 root@r-freeb08.mtr.labs.mlnx:/usr/obj/usr/kernel_git/amd64.amd64/sys/LATEST amd64
Is this reproducible in any way?
(In reply to Michael Tuexen from comment #1) Yes it seems we get it on each internal regressions' run.
For me it looks as if a special handling is missing for the transient IPSEC errors, similar to how EPERM from fw is handled.
Can you please enable bblogging: Enable BBR logging for all TCP sessions: # sysctl net.inet.tcp.bb.log_auto_mode=4 # sysctl net.inet.tcp.bb.log_auto_all=1 # sysctl net.inet.tcp.bb.log_auto_ratio=1 when this panic is easily reproducible? Alternatively, a tcpdump of the tcp session(s) within the ipsec tunnel (not a interface packet capture, but after the decoding). The core should then contain full chains of the relevant information we could extract offline... The full sequence of tcp packets from the onset of packet loss and the ACKs (with SACKs) from the remote side should yield some clue why this is happening. This appears to be some kind of SACK (selective acknowledgment) accounting error - but no recent change in this area...
Hey, this panic reproduced for me with all required params sysctl net.inet.tcp.bb.log_auto_mode=4 # sysctl net.inet.tcp.bb.log_auto_all=1 # sysctl net.inet.tcp.bb.log_auto_ratio=1 BT Full: __curthread () at /usr/kernel_git/sys/amd64/include/pcpu_aux.h:57 57 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) bt full #0 __curthread () at /usr/kernel_git/sys/amd64/include/pcpu_aux.h:57 td = <optimized out> #1 doadump (textdump=textdump@entry=1) at /usr/kernel_git/sys/kern/kern_shutdown.c:403 error = 0 coredump = <optimized out> #2 0xffffffff80b3b380 in kern_reboot (howto=260) at /usr/kernel_git/sys/kern/kern_shutdown.c:521 once = 0 __pc = <optimized out> #3 0xffffffff80b3b87f in vpanic (fmt=0xffffffff8114668e "sackhint bytes rtx >= 0", ap=ap@entry=0xfffffe0157943520) at /usr/kernel_git/sys/kern/kern_shutdown.c:965 buf = "sackhint bytes rtx >= 0", '\000' <repeats 232 times> __pc = <optimized out> __pc = <optimized out> __pc = <optimized out> other_cpus = {__bits = {3839, 0 <repeats 15 times>}} td = 0xfffff8014e8d4740 bootopt = <unavailable> newpanic = <optimized out> #4 0xffffffff80b3b623 in panic (fmt=<unavailable>) at /usr/kernel_git/sys/kern/kern_shutdown.c:889 ap = {{gp_offset = 8, fp_offset = 48, overflow_arg_area = 0xfffffe0157943550, reg_save_area = 0xfffffe01579434f0}} #5 0xffffffff80d3bb3c in tcp_sack_doack (tp=<optimized out>, to=0xfffffe01579435e8, th_ack=<optimized out>) at /usr/kernel_git/sys/netinet/tcp_sack.c:779 sack_blocks = {{start = 331596902, end = 331600458}, {start = 3162811008, end = 4294965253}, {start = 16, end = 0}, {start = 0, end = 0}, {start = 16, end = 0}} loss_hiack = 331600458 loss_thresh = 0 loss_sblks = 0 notlost_bytes = <unavailable> num_sack_blks = <optimized out> sack_changed = <optimized out> delivered_data = <optimized out> left_edge_delta = 3556 i = <optimized out> sack = <optimized out> j = <optimized out> sblkp = 0xfffffe0157943560 temp = <optimized out> cur = 0xfffff80635d66cc0 #6 0xffffffff80d2ccb2 in tcp_do_segment (tp=0xfffff805bc84aa80, tp@entry=<error reading variable: value is not available>, m=0xfffff80126113700, m@entry=<error reading variable: value is not available>, th=0xfffff80126113798, th@entry=<error reading variable: value is not available>, drop_hdrlen=64, drop_hdrlen@entry=<error reading variable: value is not available>, tlen=<optimized out>, tlen@entry=<error reading variable: value is not available>, iptos=<unavailable>, iptos@entry=<error reading variable: value is not available>) at /usr/kernel_git/sys/netinet/tcp_input.c:2501 to = {to_flags = 144, to_tsval = 2889934141, to_tsecr = 7369540, to_sacks = 0xfffff801261137bc "\023\303\305Q\023\303\3112\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\3--Type <RET> for more, q to quit, c to continue without paging-- 36\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255", <incomplete sequence \336>..., to_signature = 0xfffffe0157943640 "", to_tfo_cookie = 0xffffffff80b345d7 <_rm_runlock_debug+103> "\200\273:\001", to_mss = 584, to_wscale = 118 'v', to_nsacks = 1 '\001', to_tfo_len = 255 '\377', to_spare = 638662400} inp = 0xfffff805bc84aa80 needoutput = 0 incforsyn = 0 so = 0xfffff8011beddb40 inc = 0xfffff805bc84ab0c thflags = 16 sack_changed = SACK_NOCHANGE nsegs = 1 s = <optimized out> rstreason = <optimized out> tiwin = 62208 todrop = <optimized out> acked = <optimized out> tfo_syn = <optimized out> mfree = <optimized out> ourfinisacked = <optimized out> maxseg = <optimized out> win = <optimized out> close = <optimized out> #7 0xffffffff80d2a177 in tcp_input_with_port (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>, port=port@entry=0) at /usr/kernel_git/sys/netinet/tcp_input.c:1160 to = {to_flags = 0, to_tsval = 4294967295, to_tsecr = 3128890304, to_sacks = 0xfffffe0157943720 "@7\224W\001\376\377\377\202\340\n\201\377\377\377\377\024", to_signature = 0xffffffff80ac4e7e <timercb+398> "\353\023E1\366H\211\3371\366\350\263\364\377\377\203", <incomplete sequence \324>, to_tfo_cookie = 0x1cc9 <error: Cannot access memory at address 0x1cc9>, to_mss = 34406, to_wscale = 254 '\376', to_nsacks = 163 '\243', to_tfo_len = 208 '\320', to_spare = 1469331520} m = 0xfffff80126113700 th = 0xfffff80126113798 ip = 0xfffff80126113784 inp = <optimized out> tp = <optimized out> so = <optimized out> optp = 0xfffff801261137ac "\001\001\b\n\254@\345=C\277\255\323\001\001\005\n\023\303\305Q\023\303\3112\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255", <incomplete sequence \336>... optlen = 24 tlen = <optimized out> rstreason = <optimized out> --Type <RET> for more, q to quit, c to continue without paging-- fwd_tag = 0x0 ip6 = 0x0 s = 0x0 off0 = <optimized out> iptos = <optimized out> off = <optimized out> len = <optimized out> ipttl = <optimized out> thflags = <optimized out> drop_hdrlen = 64 lookupflag = 5 isipv6 = <optimized out> #8 0xffffffff80d2abab in tcp_input (mp=<unavailable>, offp=<unavailable>, proto=<unavailable>) at /usr/kernel_git/sys/netinet/tcp_input.c:1492 No locals. #9 0xffffffff80d1892b in ip_input (m=0x0, m@entry=<error reading variable: value is not available>) at /usr/kernel_git/sys/netinet/ip_input.c:861 ip = 0xfffff80126113784 ia = <optimized out> hlen = 20 dchg = <optimized out> ifp = <optimized out> ip_len = <optimized out> sum = <optimized out> odst = <optimized out> strong_es = <optimized out> ifa = <optimized out> #10 0xffffffff80c9393d in netisr_dispatch_src (proto=proto@entry=1, source=source@entry=0, m=0xfffff80126113700) at /usr/kernel_git/sys/net/netisr.c:1152 __pc = <optimized out> __pc = <optimized out> __pc = <optimized out> npp = <optimized out> nwsp = <optimized out> npwp = <optimized out> error = <optimized out> cpuid = <optimized out> dosignal = <optimized out> dispatch_policy = <optimized out> out_unlock = <optimized out> #11 0xffffffff80c93c7f in netisr_dispatch (proto=<unavailable>, proto@entry=1, m=<unavailable>) at /usr/kernel_git/sys/net/netisr.c:1243 No locals. #12 0xffffffff80c77da1 in gif_input (m=<unavailable>, m@entry=0xfffff80126113700, ifp=ifp@entry=0xfffff805208ce800, proto=proto@entry=4, ecn=<optimized out>, ecn@entry=0 '\000') at /usr/kernel_git/sys/net/if_gif.c:587 af = 2 ip = <optimized out> --Type <RET> for more, q to quit, c to continue without paging-- t = <optimized out> ip6 = <optimized out> isr = 1 n = <optimized out> eip = <optimized out> oldifp = <optimized out> eh = <optimized out> #13 0xffffffff80d09324 in in_gif_input (m=0xfffff80126113700, off=20, proto=4, arg=<optimized out>) at /usr/kernel_git/sys/netinet/in_gif.c:315 sc = <optimized out> gifp = 0xfffff805208ce800 ip = <optimized out> ecn = <optimized out> #14 0xffffffff80d157b7 in encap_input (head=<optimized out>, m=0xfffff80126113700, off=20, proto=proto@entry=4) at /usr/kernel_git/sys/netinet/ip_encap.c:328 encap_et = {et_link = {tqe_next = 0x0, tqe_prev = 0xfffffe0157943d80}, et_td = 0xfffff8014e8d4740, et_section = {bucket = 1}, et_old_priority = 4 '\004'} match = 0xfffff800031c9c00 matchprio = <optimized out> ep = 0xfffff800031c9c00 ret = <optimized out> arg = <unavailable> #15 0xffffffff80d156a6 in encap4_input (mp=0xfffffe0157943a30, offp=0xfffffe0157943a2c, proto=4) at /usr/kernel_git/sys/netinet/ip_encap.c:370 No locals. #16 0xffffffff80d1892b in ip_input (m=0xfffff80126113700, m@entry=<error reading variable: value is not available>) at /usr/kernel_git/sys/netinet/ip_input.c:861 ip = 0xfffff80126113770 ia = <optimized out> hlen = 20 dchg = <optimized out> ifp = <optimized out> ip_len = <optimized out> sum = <optimized out> odst = <optimized out> strong_es = <optimized out> ifa = <optimized out> #17 0xffffffff80c9393d in netisr_dispatch_src (proto=proto@entry=1, source=source@entry=0, m=0xfffff80126113700) at /usr/kernel_git/sys/net/netisr.c:1152 __pc = <optimized out> __pc = <optimized out> __pc = <optimized out> npp = <optimized out> nwsp = <optimized out> npwp = <optimized out> error = <optimized out> cpuid = <optimized out> dosignal = <optimized out> dispatch_policy = <optimized out> --Type <RET> for more, q to quit, c to continue without paging-- out_unlock = <optimized out> #18 0xffffffff80c93c7f in netisr_dispatch (proto=<unavailable>, proto@entry=1, m=<unavailable>) at /usr/kernel_git/sys/net/netisr.c:1243 No locals. #19 0xffffffff80c760fa in ether_demux (ifp=ifp@entry=0xfffff8011b78e000, m=<unavailable>) at /usr/kernel_git/sys/net/if_ethersubr.c:952 i = <optimized out> eh = <optimized out> ether_type = <optimized out> isr = 1 #20 0xffffffff80c7779f in ether_input_internal (ifp=0xfffff8011b78e000, m=<unavailable>) at /usr/kernel_git/sys/net/if_ethersubr.c:716 eh = 0xfffff80126113762 saved_vnet = 0xfffff80003260f80 etype = <optimized out> evl = <optimized out> #21 ether_nh_input (m=<unavailable>, m@entry=<error reading variable: value is not available>) at /usr/kernel_git/sys/net/if_ethersubr.c:746 No locals. #22 0xffffffff80c9393d in netisr_dispatch_src (proto=proto@entry=5, source=source@entry=0, m=m@entry=0xfffff80126113700) at /usr/kernel_git/sys/net/netisr.c:1152 __pc = <optimized out> __pc = <optimized out> __pc = <optimized out> npp = <optimized out> nwsp = <optimized out> npwp = <optimized out> error = <optimized out> cpuid = <optimized out> dosignal = <optimized out> dispatch_policy = <optimized out> out_unlock = <optimized out> #23 0xffffffff80c93c7f in netisr_dispatch (proto=<unavailable>, proto@entry=5, m=<unavailable>, m@entry=0xfffff80126113700) at /usr/kernel_git/sys/net/netisr.c:1243 No locals. #24 0xffffffff80c76559 in ether_input (ifp=0xfffff8011b78e000, ifp@entry=<error reading variable: value is not available>, m=0xfffff80126113700, m@entry=<error reading variable: value is not available>) at /usr/kernel_git/sys/net/if_ethersubr.c:857 printedonce = false et = {et_link = {tqe_next = 0x40, tqe_prev = 0xfffffe01586b1000}, et_td = 0x4, et_section = {bucket = 64}, et_old_priority = 0 '\000'} needs_epoch = false saved_vnet = 0xfffff80003260f80 mn = 0x0 #25 0xffffffff80d3401c in tcp_lro_flush_all (lc=0xfffffe0158c21dc0) at /usr/kernel_git/sys/netinet/tcp_lro.c:1255 mb = 0xfffff80126113700 __pc = <optimized out> __pc = <optimized out> saved_vnet = 0x0 seq = 13775403306773905408 x = 572 --Type <RET> for more, q to quit, c to continue without paging-- nseq = <optimized out> done = <optimized out> #26 0xffffffff82778971 in mlx5_cq_arm (cq=0xfffffe0158c21ce0, cmd=0, uar_page=<unavailable>, doorbell_lock=0x0, cons_index=<optimized out>) at ../../dev/mlx5/cq.h:149 ci = <unavailable> doorbell = <optimized out> sn = <optimized out> #27 mlx5e_cq_arm (cq=0xfffffe0158c21cc0, dblock=0x0) at ../../dev/mlx5/mlx5_en/en.h:1273 mcq = 0xfffffe0158c21ce0 #28 mlx5e_rx_cq_comp (mcq=<optimized out>, eqe=<optimized out>) at /tmp/ofed_freebsd_3_0/sys/modules/mlx5en/../../dev/mlx5/mlx5_en/mlx5_en_rx.c:672 c = 0xfffffe0158c21b80 rq = 0xfffffe0158c21b80 i = 768 #29 0xffffffff826a80a4 in mlx5_cq_completion (dev=0xfffffe01581c4000, eqe=0xfffffe01582f9500) at /tmp/ofed_freebsd_3_0/sys/modules/mlx5/../../dev/mlx5/mlx5_core/mlx5_cq.c:73 et = {et_link = {tqe_next = 0xfffffe0157943978, tqe_prev = 0xfffffe0157943e78}, et_td = 0xfffff8014e8d4740, et_section = {bucket = 1}, et_old_priority = 4 '\004'} table = 0xfffffe015822e848 cqn = 1065 cq = <optimized out> do_lock = <optimized out> #30 0xffffffff826ab2ba in mlx5_eq_int (dev=0xfffffe01581c4000, eq=0xfffff8014ceaf700) at /tmp/ofed_freebsd_3_0/sys/modules/mlx5/../../dev/mlx5/mlx5_core/mlx5_eq.c:275 eqes_found = <optimized out> set_ci = 0 rsn = 88748544 cqn = <optimized out> port = <optimized out> eqe = <optimized out> #31 0xffffffff826aab35 in kfree (ptr=0xfffff8000549f800) at ../../compat/linuxkpi/common/include/linux/slab.h:201 No locals. #32 mlx5_create_map_eq (dev=0xfffff8014e8d4740, eq=0xfffff8000549f858, vecidx=<optimized out>, nent=<optimized out>, mask=8590000128) at /tmp/ofed_freebsd_3_0/sys/modules/mlx5/../../dev/mlx5/mlx5_core/mlx5_eq.c:498 out = {1, 3671661053, 1478680496, 4294966785} priv = 0xfffff8014e93e6f0 err = <optimized out> inlen = <optimized out> in = 0xfffff8000549f800 pas = <optimized out> eqc = 0xfffff8000549f810 err_irq = <optimized out> #33 0xffffffff826f3a99 in ?? () at /tmp/ofed_freebsd_3_0/sys/modules/linuxkpi/../../compat/linuxkpi/common/src/linux_interrupt.c:110 from /boot/kernel/linuxkpi.ko VBAD = VBAD TDS_RUNQ = TDS_RUNQ PRISON_STATE_ALIVE = PRISON_STATE_ALIVE VSTATE_CONSTRUCTED = VSTATE_CONSTRUCTED --Type <RET> for more, q to quit, c to continue without paging-- VDIR = VDIR TDS_INACTIVE = TDS_INACTIVE PRS_NEW = PRS_NEW INTR_ENTROPY = INTR_ENTROPY VFIFO = VFIFO VLASTTYPE = VMARKER INTR_TYPE_CLK = INTR_TYPE_CLK TDS_RUNNING = TDS_RUNNING pci_channel_io_perm_failure = pci_channel_io_perm_failure VSTATE_UNINITIALIZED = VSTATE_UNINITIALIZED pci_channel_io_frozen = pci_channel_io_frozen PRS_NORMAL = PRS_NORMAL INTR_TYPE_MISC = INTR_TYPE_MISC VLASTSTATE = VSTATE_DEAD UIO_USERSPACE = UIO_USERSPACE INTR_TYPE_BIO = INTR_TYPE_BIO INTR_EXCL = INTR_EXCL VCHR = VCHR TDS_INHIBITED = TDS_INHIBITED INTR_TYPE_AV = INTR_TYPE_AV VNON = VNON VMARKER = VMARKER PRISON_STATE_DYING = PRISON_STATE_DYING INTR_TYPE_TTY = INTR_TYPE_TTY PRS_ZOMBIE = PRS_ZOMBIE INTR_MPSAFE = INTR_MPSAFE pci_channel_io_normal = pci_channel_io_normal INTR_TYPE_CAM = INTR_TYPE_CAM INTR_MD1 = INTR_MD1 INTR_MD2 = INTR_MD2 INTR_MD3 = INTR_MD3 INTR_MD4 = INTR_MD4 VBLK = VBLK VLNK = VLNK IRQ_WAKE_THREAD = IRQ_WAKE_THREAD VREG = VREG VSTATE_DEAD = VSTATE_DEAD VSOCK = VSOCK UIO_WRITE = UIO_WRITE PRISON_STATE_INVALID = PRISON_STATE_INVALID UIO_READ = UIO_READ VSTATE_DESTROYING = VSTATE_DESTROYING UIO_SYSSPACE = UIO_SYSSPACE --Type <RET> for more, q to quit, c to continue without paging-- IRQ_HANDLED = IRQ_HANDLED IRQ_NONE = IRQ_NONE UIO_NOCOPY = UIO_NOCOPY TDS_CAN_RUN = TDS_CAN_RUN INTR_TYPE_NET = INTR_TYPE_NET #34 0x0000000000000000 in ?? () (kgdb) frame 6 #6 0xffffffff80d2ccb2 in tcp_do_segment (tp=0xfffff805bc84aa80, tp@entry=<error reading variable: value is not available>, m=0xfffff80126113700, m@entry=<error reading variable: value is not available>, th=0xfffff80126113798, th@entry=<error reading variable: value is not available>, drop_hdrlen=64, drop_hdrlen@entry=<error reading variable: value is not available>, tlen=<optimized out>, tlen@entry=<error reading variable: value is not available>, iptos=<unavailable>, iptos@entry=<error reading variable: value is not available>) at /usr/kernel_git/sys/netinet/tcp_input.c:2501 2501 if (((sack_changed = tcp_sack_doack(tp, &to, th->th_ack)) != 0) && (kgdb) p *tp $1 = {t_inpcb = {inp_hash_exact = {cle_next = 0x0, cle_prev = 0xfffffe010c8f8368}, inp_hash_wild = {cle_next = 0x0, cle_prev = 0x0}, inp_lock = {lock_object = { lo_name = 0xffffffff810e52d1 "tcpinp", lo_flags = 90898432, lo_data = 0, lo_witness = 0xfffff8087fd75000}, rw_lock = 18446735283229378368}, inp_refcount = 1, inp_flags = 8256, inp_flags2 = 0, inp_numa_domain = 255 '\377', inp_ppcb = 0x0, inp_socket = 0xfffff8011beddb40, inp_pcbinfo = 0xfffffe003c1dfa28, inp_cred = 0xfffff80003bce200, inp_flow = 0, inp_vflag = 1 '\001', inp_ip_ttl = 64 '@', inp_ip_p = 0 '\000', inp_ip_minttl = 0 '\000', inp_flowid = 739267202, inp_smr = 0, inp_snd_tag = 0x0, inp_flowtype = 191, inp_inc = {inc_flags = 0 '\000', inc_len = 0 '\000', inc_fibnum = 0, inc_ie = {ie_fport = 52038, ie_lport = 4295, ie_dependfaddr = {id46_addr = {ia46_pad32 = {0, 0, 0}, ia46_addr4 = {s_addr = 349685013}}, id6_addr = {__u6_addr = { __u6_addr8 = '\000' <repeats 12 times>, "\025\305\327\024", __u6_addr16 = {0, 0, 0, 0, 0, 0, 50453, 5335}, __u6_addr32 = {0, 0, 0, 349685013}}}}, ie_dependladdr = {id46_addr = {ia46_pad32 = {0, 0, 0}, ia46_addr4 = {s_addr = 3489170453}}, id6_addr = {__u6_addr = { __u6_addr8 = '\000' <repeats 12 times>, "\025\204\370", <incomplete sequence \317>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 33813, 53240}, __u6_addr32 = {0, 0, 0, 3489170453}}}}, ie6_zoneid = 0}}, inp_label = 0x0, inp_sp = 0xfffff8011b940640, {inp_ip_tos = 0 '\000', inp_options = 0x0, inp_moptions = 0x0}, { in6p_options = 0x0, in6p_outputopts = 0x0, in6p_moptions = 0x0, in6p_icmp6filt = 0x0, in6p_cksum = 0, in6p_hops = 0}, inp_portlist = {cle_next = 0x0, cle_prev = 0xfffff8047c7e0cc0}, inp_phd = 0xfffff8047c7e0cc0, inp_gencnt = 1632, spare_ptr = 0x0, inp_rt_cookie = 41, {inp_route = {ro_nh = 0xfffff806d1796600, ro_lle = 0x0, ro_prepend = 0x0, ro_plen = 0, ro_flags = 256, ro_mtu = 0, spare = 0, ro_dst = {sa_len = 16 '\020', sa_family = 2 '\002', sa_data = "\000\000\025\305\327\024\000\000\000\000\000\000\000"}}, inp_route6 = {ro_nh = 0xfffff806d1796600, ro_lle = 0x0, ro_prepend = 0x0, ro_plen = 0, ro_flags = 256, ro_mtu = 0, spare = 0, ro_dst = {sin6_len = 16 '\020', sin6_family = 2 '\002', sin6_port = 0, sin6_flowinfo = 349685013, sin6_addr = { __u6_addr = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}}}, inp_list = { cle_next = 0xfffff805bcbe3000, cle_prev = 0xfffff8067fe066b8}}, t_fb = 0xffffffff8170dbb0 <tcp_def_funcblk>, t_fb_ptr = 0x0, t_callout = {c_links = {le = { le_next = 0xfffff805bcb536d8, le_prev = 0xfffffe003bfb0e98}, sle = {sle_next = 0xfffff805bcb536d8}, tqe = {tqe_next = 0xfffff805bcb536d8, tqe_prev = 0xfffffe003bfb0e98}}, c_time = 31653168034496, c_precision = 73014439, c_arg = 0xfffff805bc84aa80, c_func = 0xffffffff80d488e0 <tcp_timer_enter>, c_lock = 0xfffff805bc84aaa0, c_flags = 2, c_iflags = 20, c_cpu = 0}, t_timers = {31653168034496, 9223372036854775807, 62563177950162, 9223372036854775807, 9223372036854775807}, t_precisions = {73014439, 0, 1932735150000, 0, 0}, t_hpts = {tqe_next = 0x0, tqe_prev = 0x0}, t_inqueue = {stqh_first = 0x0, stqh_last = 0xfffff805bc84acb8}, t_hpts_request = 0, t_hpts_slot = 0, t_hpts_drop_reas = 0, t_hpts_gencnt = 0, t_hpts_cpu = 65535, t_lro_cpu = 65535, t_in_hpts = IHPTS_NONE, t_maxseg = 1240, _t_logstate = 4, t_port = 0, t_state = 4, t_idle_reduce = 0, t_delayed_ack = 0, t_fin_is_rst = 0, t_log_state_set = 0, bits_spare = 0, t_flags = 1882194913, snd_una = 331596902, snd_max = 331600458, snd_nxt = 331600458, snd_up = 331596902, snd_wnd = 64512, snd_cwnd = 1240, ts_offset = 1129265807, rfbuf_ts = 7366590, rcv_numsacks = 0, t_tsomax = 0, t_tsomaxsegcount = 0, t_tsomaxsegsize = 0, rcv_nxt = 420000108, rcv_adv = 420065836, rcv_wnd = 65728, t_flags2 = 1034, t_srtt = 834, t_rttvar = 186, ts_recent = 2889934141, snd_scale = 6 '\006', rcv_scale = 6 '\006', snd_limited = 2 '\002', request_r_scale = 6 '\006', last_ack_sent = 420000108, t_rcvtime = 2154252203, rcv_up = 420000108, t_segqlen = 0, t_segqmbuflen = 0, t_segq = {tqh_first = 0x0, tqh_last = 0xfffff805bc84ad50}, snd_ssthresh = 2456, snd_wl1 = 420000108, snd_wl2 = 331596902, irs = 420000079, iss = 331585966, t_acktime = 2154252203, t_sndtime = 2154250587, ts_recent_age = 7369555, snd_recover = 331600458, t_oobflags = 0 '\000', t_iobc = 0 '\000', t_nic_ktls_xmit = 0 '\000', t_nic_ktls_xmit_dis = 0 '\000', t_nic_ktls_spare = 0 '\000', t_rxtcur = 272, t_rxtshift = 0, t_rtttime = 0, t_rtseq = 331598236, t_starttime = 2154249273, t_fbyte_in = 2154250253, t_fbyte_out = 2154249285, t_pmtud_saved_maxseg = 0, t_blackhole_enter = 0, t_blackhole_exit = 0, t_rttmin = 30, t_softerror = 0, max_sndwnd = 65728, snd_cwnd_prev = 1228, snd_ssthresh_prev = 2456, snd_recover_prev = 331600458, t_sndzerowin = 0, snd_numholes = 1, t_badrxtwin = 7368668, snd_holes = {tqh_first = 0xfffff80635d66cc0, tqh_last = 0xfffff80635d66cd0}, snd_fack = 331600458, sackblks = {{start = 420000080, end = 420000108}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}}, sackhint = {nexthole = 0xfffff80635d66cc0, sack_bytes_rexmit = -235, last_sack_ack = 331600458, delivered_data = 1228, sacked_bytes = 3321, recover_fs = 4784, prr_delivered = 2456, prr_out = 235, hole_bytes = 235, lost_bytes = 235}, t_rttlow = 15, rfbuf_cnt = 0, tod = 0x0, t_sndrexmitpack = 35, t_rcvoopack = 0, t_toe = 0x0, t_cc = 0xffffffff81709ff8 <cubic_cc_algo>, t_ccv = {cc_data = 0xfffff8014ec14e00, bytes_this_ack = 1228, curack = 331596902, flags = 2, type = 6, ccvc = { tcp = 0xfffff805bc84aa80, sctp = 0xfffff805bc84aa80}, nsegs = 1, labc = 0 '\000'}, t_bytes_acked = 0, t_maxunacktime = 0, t_keepinit = 0, t_keepidle = 0, t_keepintvl = 0, t_keepcnt = 0, t_dupacks = 3, t_lognum = 169, t_loglimit = 5000, t_rcep = 0, t_scep = 0, t_pacing_rate = -1, t_logs = { stqh_first = 0xfffff8063fd6a9c0, stqh_last = 0xfffff806d08db9c0}, t_lin = 0x0, t_lib = 0x0, t_output_caller = 0x0, t_stats = 0x0, t_logsn = 169, gput_ts = 0, gput_seq = 0, gput_ack = 0, t_stats_gput_prev = 0, t_maxpeakrate = 0, t_sndtlppack = 0, t_sndtlpbyte = 0, t_sndbytes = 21742, t_snd_rxt_bytes = 15964, t_dsack_bytes = 3449, t_dsack_tlp_bytes = 0, t_dsack_pack = 5, t_tmr_granularity = 1 '\001', t_rttupdated = 14 '\016', t_tfo_client_cookie_len = 0 '\000', t_end_info_status = 0, t_tfo_pending = 0x0, t_tfo_cookie = {client = '\000' <repeats 15 times>, server = 0}, {t_end_info_bytes = "\000\000\000\000\000\000\000", --Type <RET> for more, q to quit, c to continue without paging-- t_end_info = 0}, t_osd = {osd_nslots = 1, osd_slots = 0xfffff80003e20720, osd_next = {le_next = 0xfffff805bcbe34f8, le_prev = 0xfffff8067fe06a48}}, _t_logpoint = 0 '\000'} Please let me know if need additional information
can you please provide the core (w/ kernel & kernel.debug)? The bblog data will have captured the evolution how we ended up in this state - but it currently requires the core to undig all that information manually. And I spotted an oversight in the tcp_output error path, but that's not the issue here... But the problem happens quite early in the affected TCP session: iss = 331585966 recover = 14492 fack = 14492 una = 10936 max = 14492 in frame 6, the incoming tcp header option sack appears to cover: 11171-12164(993) in frame 5, the sack block seems to cover the entire range from una to max (10936-14492) - but this is probably after processing it in full. The sack_bytes_rexmit is -235, or the delta between snd_una and the left edge of the to_sack block... I wonder if this is a DSACK block on a cumulative ack, after the holes have been retransmitted, ending up miscalculating this.. (So with reordering of packet delivery or ACKs carrying SACK information).
Here is the tgz file with the kernel.full and vmcore https://kib.kiev.ua/kib/ipsec_bug.tgz Please let me know when you download the file, so I can remove it
(In reply to Daniel from comment #7) Got it transferred to freefall. Thx.
I can see a number of retransmission timeouts (so the theory around EPERM on the ipsec may be valid). There were 3 reordering/loss episodes in that short flow (169 logging events), and during the last one, the ACK has a SACK block, where the right edge moved left compared to an ACK received prior to that... Which is peculiar (maybe reordering induced by the packet held up in processing?), but obviously the tcp stack shouldn't loose it's bearings accounting for such events. I'll update this once I have decoded more of the timing and events.
It appears that the issue may have to do with some LRO going on as well on the receiver side - which doesn't ACK each individual packet, but typically 2 consecutive one, even while the receiver knows loss/reordering is on-going. At that stage the receiver normally would ACK every packet received... The ultimate issue appears to happen after a RTO timeout, when the last outstanding (2-packet) hole is filled, and the cumulative ACK covering it all also contains a DSACK block at the beginning of the former hole... Normally, such a DSACK block should be discareded and ignored during the SACK update process - not really sure why it's apparently processed erraneously, leading to the panic...
e3b9058e5cd0f541da596624a366e14cabcf2e2a / D43085 was found early on during a code review after this issue. Completed the inspection of the bblog traces - and confirmed that the above fix actually addressed the root cause. The issue is indeed connected to an error during ip_output (55 - ENOBUFS), which had to be dealt with appropriately by the TCP stack. The in-depth explanation: When there is no response from the remote side for some time, TCP will retransmit some packets as classical retransmits after a (lengthy) timeout - RTO - outside of SACK. When an ACK with SACK information now arrives, the SACK scoreboard is re-populated (the RTO had previously cleared it all). At a subsequent transmission opportunity, the SACK hole is transmitted - if this transmission fails, the unrolling of the transmission could end up not in the expected state, if the entire hole was to be transmitted. This effect got introduced by the Lost Retransmission Detection (LRD) feature, where the hole->rxmit pointer tracks when a retransmission ought to have arrived at the receiver. Once a hole is fully retransmitted, it set to snd_recover. So, in order to elicit this: LRD has to be enabled (now enabled by default in CURRENT) SACK loss recovery must have started RTO occurs Overlapping packets get retransmitted by RTO without SACK SACK information is recovered SACK retransmits same data as RTO - and the entrie hole, but fails due to an internal error If the non-SACK transmission eventually makes it, because of the improper handling of the SACK hole retransmit pointer, we could end up with incorrect accounting - running in to the panic. In the bblog, we can see the ip_output error for the SACK retransmission and preceeding this, a RTO retransmission. Furthermore, the hole->rxmit pointer is exactly the size of the hole beneath snd_recover.
Daniel, Konstantin, can you verify that you don't experience the problem anyone when Richards fix is in place? Best regards Michael
Hi Looks like the issue is fixed! unable to reproduce the issue Thanks,
(In reply to Daniel from comment #13) Perfect! Thanks for double checking. Best regards Michael
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=ac83f3fa44f3c64ddb4f817aa4c2032546a1ac92 commit ac83f3fa44f3c64ddb4f817aa4c2032546a1ac92 Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2023-12-18 12:01:28 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2023-12-24 13:32:29 +0000 tcp: properly unroll SACK transmission on TX error with LRD PR: 275798 Reported by: ddaniel@nvidia.com Reviewed by: tuexen, #transport MFC after: 1 week Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43085 (cherry picked from commit e3b9058e5cd0f541da596624a366e14cabcf2e2a) sys/netinet/tcp_output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)