264257 – [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

Bug 264257 - [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m_copydata ... at /usr/src/sys/kern/uipc_mbuf.c:659

Summary: [tcp] Panic: Fatal trap 12: page fault while in kernel mode (if_io_tqg_4) - m...

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	13.1-RELEASE
Hardware:	amd64 Any

Importance:	--- Affects Only Me
Assignee:	freebsd-net (Nobody)

URL:
Keywords:	crash, needs-qa

Depends on:
Blocks:	264030
	Show dependency tree / graph

Reported:	2022-05-26 16:00 UTC by iron.udjin
Modified:	2023-03-26 03:40 UTC (History)
CC List:	15 users (show)

See Also:	263445 260393

Flags:	koobs: maintainer-feedback? (net) koobs: maintainer-feedback? (hselasky) markj: maintainer-feedback- koobs: maintainer-feedback? (jhb) tuexen: mfc-stable13+ tuexen: mfc-stable12+

Attachments
sysctl.conf (3.14 KB, text/plain) 2022-06-06 20:48 UTC, iron.udjin	no flags	Details
loader.conf (1.18 KB, text/plain) 2022-06-06 20:48 UTC, iron.udjin	no flags	Details
D35446 patch to apply clean against 13.1-RELEASE (4.08 KB, patch) 2022-06-11 20:29 UTC, Richard Scheffenegger	no flags	Details \| Diff
clean patch against 13.1-RELEASE (4.10 KB, patch) 2022-06-11 20:54 UTC, Richard Scheffenegger	no flags	Details \| Diff
more logging (4.77 KB, patch) 2022-06-14 17:30 UTC, Richard Scheffenegger	no flags	Details \| Diff
Fix TCPPCAP (586 bytes, patch) 2022-06-15 20:58 UTC, Michael Tuexen	no flags	Details \| Diff
symptomatic logging patch (3.06 KB, patch) 2022-06-21 08:42 UTC, Richard Scheffenegger	no flags	Details \| Diff
Fix offset computation (565 bytes, patch) 2022-09-18 13:09 UTC, Michael Tuexen	no flags	Details \| Diff
Fix offset computation (stable/13) (552 bytes, patch) 2022-09-18 13:17 UTC, Michael Tuexen	no flags	Details \| Diff
Updated: Fix offset computation (stable/13) (641 bytes, patch) 2022-09-18 13:51 UTC, Michael Tuexen	no flags	Details \| Diff
Updated: Fix offset computation (releng/13.1) (654 bytes, patch) 2022-09-18 13:53 UTC, Michael Tuexen	no flags	Details \| Diff
Show Obsolete (3) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description iron.udjin 2022-05-26 16:00:56 UTC

# uname -a
FreeBSD new 13.1-RELEASE FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC amd64

Fatal trap 12: page fault while in kernel mode
cpuid = 4; apic id = 04
fault virtual address	= 0x18
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80cae31d
stack pointer	       = 0x28:0xfffffe017cb5b670
frame pointer	       = 0x28:0xfffffe017cb5b6e0
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 0 (if_io_tqg_4)
trap number		= 12
panic: page fault
cpuid = 4
time = 1653578974
KDB: stack backtrace:
#0 0xffffffff80c69465 at kdb_backtrace+0x65
#1 0xffffffff80c1bb1f at vpanic+0x17f
#2 0xffffffff80c1b993 at panic+0x43
#3 0xffffffff810afdf5 at trap_fatal+0x385
#4 0xffffffff810afe4f at trap_pfault+0x4f
#5 0xffffffff81087528 at calltrap+0x8
#6 0xffffffff80de07c9 at tcp_output+0x1339
#7 0xffffffff80dd7eed at tcp_do_segment+0x2cfd
#8 0xffffffff80dd44b1 at tcp_input_with_port+0xb61
#9 0xffffffff80dd515b at tcp_input+0xb
#10 0xffffffff80dc691f at ip_input+0x11f
#11 0xffffffff80d53089 at netisr_dispatch_src+0xb9
#12 0xffffffff80d36ea8 at ether_demux+0x138
#13 0xffffffff80d38235 at ether_nh_input+0x355
#14 0xffffffff80d53089 at netisr_dispatch_src+0xb9
#15 0xffffffff80d372d9 at ether_input+0x69
#16 0xffffffff80d4f4d7 at iflib_rxeof+0xc27
#17 0xffffffff80d49b22 at _task_fn_rx+0x72
Uptime: 10d13h2m40s
Dumping 24936 out of 130956 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55		__asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xffffffff80c1b71c in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487
#3  0xffffffff80c1bb8e in vpanic (fmt=0xffffffff811b4fb9 "%s", ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:920
#4  0xffffffff80c1b993 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:844
#5  0xffffffff810afdf5 in trap_fatal (frame=0xfffffe017cb5b5b0, eva=24) at /usr/src/sys/amd64/amd64/trap.c:944
#6  0xffffffff810afe4f in trap_pfault (frame=0xfffffe017cb5b5b0, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:763
#7  <signal handler called>
#8  m_copydata (m=0x0, m@entry=0xfffff81ab7913e00, off=0, len=1, cp=<optimized out>) at /usr/src/sys/kern/uipc_mbuf.c:659
#9  0xffffffff80de07c9 in tcp_output (tp=<optimized out>) at /usr/src/sys/netinet/tcp_output.c:1081
#10 0xffffffff80dd7eed in tcp_do_segment (m=<optimized out>, th=<optimized out>, so=<optimized out>, tp=0xfffffe025fb86518, drop_hdrlen=52, tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2637
#11 0xffffffff80dd44b1 in tcp_input_with_port (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>, port=port@entry=0) at /usr/src/sys/netinet/tcp_input.c:1400
#12 0xffffffff80dd515b in tcp_input (mp=0xfffff81ab7913e00, offp=0x0, proto=1) at /usr/src/sys/netinet/tcp_input.c:1496
#13 0xffffffff80dc691f in ip_input (m=0x0) at /usr/src/sys/netinet/ip_input.c:839
#14 0xffffffff80d53089 in netisr_dispatch_src (proto=1, source=source@entry=0, m=0xfffff808938aa000) at /usr/src/sys/net/netisr.c:1143
#15 0xffffffff80d5345f in netisr_dispatch (proto=3079749120, m=0x1) at /usr/src/sys/net/netisr.c:1234
#16 0xffffffff80d36ea8 in ether_demux (ifp=ifp@entry=0xfffff80105fb0000, m=0x0) at /usr/src/sys/net/if_ethersubr.c:921
#17 0xffffffff80d38235 in ether_input_internal (ifp=0xfffff80105fb0000, m=0x0) at /usr/src/sys/net/if_ethersubr.c:707
#18 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:737
#19 0xffffffff80d53089 in netisr_dispatch_src (proto=proto@entry=5, source=source@entry=0, m=m@entry=0xfffff808938aa000) at /usr/src/sys/net/netisr.c:1143
#20 0xffffffff80d5345f in netisr_dispatch (proto=3079749120, proto@entry=5, m=0x1, m@entry=0xfffff808938aa000) at /usr/src/sys/net/netisr.c:1234
#21 0xffffffff80d372d9 in ether_input (ifp=<optimized out>, m=0xfffff808938aa000) at /usr/src/sys/net/if_ethersubr.c:828
#22 0xffffffff80d4f4d7 in iflib_rxeof (rxq=<optimized out>, rxq@entry=0xfffffe0205951680, budget=<optimized out>) at /usr/src/sys/net/iflib.c:3047
#23 0xffffffff80d49b22 in _task_fn_rx (context=0xfffffe0205951680) at /usr/src/sys/net/iflib.c:3990
#24 0xffffffff80c67e9d in gtaskqueue_run_locked (queue=queue@entry=0xfffff801039ce700) at /usr/src/sys/kern/subr_gtaskqueue.c:371
#25 0xffffffff80c67b12 in gtaskqueue_thread_loop (arg=<optimized out>, arg@entry=0xfffffe017d1ad068) at /usr/src/sys/kern/subr_gtaskqueue.c:547
#26 0xffffffff80bd8a5e in fork_exit (callout=0xffffffff80c67a50 <gtaskqueue_thread_loop>, arg=0xfffffe017d1ad068, frame=0xfffffe017cb5bf40) at /usr/src/sys/kern/kern_fork.c:1093
#27 <signal handler called>
#28 mi_startup () at /usr/src/sys/kern/init_main.c:322
Backtrace stopped: Cannot access memory at address 0x14

Comment 1 iron.udjin 2022-05-26 16:29:59 UTC

(kgdb) list *0xffffffff80cae31d
0xffffffff80cae31d is in m_copydata (/usr/src/sys/kern/uipc_mbuf.c:659).
654			off -= m->m_len;
655			m = m->m_next;
656		}
657		while (len > 0) {
658			KASSERT(m != NULL, ("m_copydata, length > size of mbuf chain"));
659			count = min(m->m_len - off, len);
660			if ((m->m_flags & M_EXTPG) != 0)
661				m_copyfromunmapped(m, off, count, cp);
662			else
663				bcopy(mtod(m, caddr_t) + off, cp, count);

Comment 2 Michael Tuexen freebsd_committer

2022-05-27 00:16:53 UTC

Can you provide information how to reproduce it?

Comment 3 iron.udjin 2022-05-27 00:26:23 UTC

(In reply to Michael Tuexen from comment #2)

Unfortunately I don't know how to reproduce this bug. The server was working a week or so without any problem. Today it suddenly panicked and restarted. Workload is typical: web-server, postgresql, S3 (minio).

Comment 4 iron.udjin 2022-05-27 00:28:47 UTC

(In reply to Michael Tuexen from comment #2)

Do you need all of my sysctls from /etc/sysctl.conf and /boot/loader.conf?

Comment 5 John Baldwin freebsd_committer

2022-05-27 16:13:51 UTC

Looks like it walked off the end of the mbuf chain as it tried to copy one byte too many.  You could try going up to frame 8 (tcp_output) to see if the arguments passed to m_copydata() are still around (looks like we know the mbuf chain via m@entry for frame 9, but knowing the original length and offset and confirming it walked off the end would be good).  You'd have to figure out why the length was wrong though and that might need more digging in the tp or the like.

Comment 6 iron.udjin 2022-05-27 16:56:41 UTC

(In reply to John Baldwin from comment #5)

Oh...I'm not sure that I can do it by myself. I'm not so expirianced in debugging. Can I send you compressed core dump file?

Comment 7 John Baldwin freebsd_committer

2022-05-27 17:01:39 UTC

Hmm, unfortunately it would need to be the contents of /boot/kernel and /usr/lib/debug/boot/kernel along with the vmcore.  However, if you can stick those all in a tarball I can take a look at them.  It might be simpler to store them at a URL and mail that to me directly.

Comment 8 iron.udjin 2022-05-27 18:17:13 UTC

(In reply to John Baldwin from comment #7)

Emailed you link on archive.

Comment 9 Michael Tuexen freebsd_committer

2022-05-27 19:18:34 UTC

(In reply to iron.udjin from comment #8)
You might also send the link to me (tuexen@freebsd.org) and Richard (rscheff@freebsd.org), if you are willing to do so.

Comment 10 iron.udjin 2022-05-27 19:28:12 UTC

(In reply to Michael Tuexen from comment #9)

Done.

Comment 11 Michael Tuexen freebsd_committer

2022-05-27 22:20:32 UTC

(In reply to iron.udjin from comment #10)
Thanks! I could download the tarball and will also have a look at it.

Comment 12 Michael Tuexen freebsd_committer

2022-06-06 16:06:20 UTC

Are you using KTLS?

Comment 13 iron.udjin 2022-06-06 20:12:34 UTC

(In reply to Michael Tuexen from comment #12)

Yes, with www/nginx-devel

Comment 14 Michael Tuexen freebsd_committer

2022-06-06 20:38:50 UTC

(In reply to iron.udjin from comment #13)
Do you observe the panics on a regular base? If yes, could you disable KTLS and see if they still happen.

Can you also post /etc/sysctl.conf and /boot/loader.conf?

Comment 15 iron.udjin 2022-06-06 20:47:49 UTC

(In reply to Michael Tuexen from comment #14)

It's not panic on a regular base. The server works 11 days without panic with KTLS enabled.

Comment 16 iron.udjin 2022-06-06 20:48:14 UTC

Created attachment 234498 [details]
sysctl.conf

Comment 17 iron.udjin 2022-06-06 20:48:37 UTC

Created attachment 234499 [details]
loader.conf

Comment 18 Michael Tuexen freebsd_committer

2022-06-06 21:01:21 UTC

(In reply to iron.udjin from comment #17)
Thanks. Are you actually using HTCP as the congestion module? Which network interface cards are you using? Are they doing HW TLS?

Comment 19 iron.udjin 2022-06-06 21:07:36 UTC

(In reply to Michael Tuexen from comment #18)

ixl0@pci0:33:0:0:	class=0x020000 rev=0x02 hdr=0x00 vendor=0x8086 device=0x15ff subvendor=0x1849 subdevice=0x0000
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Controller X710 for 10GBASE-T'
    class      = network
    subclass   = ethernet

Network card isn't doing HW TLS. I use ktls_ocf module.

Currently I don't use HTCP but I was using it before.

# sysctl net.inet.tcp.cc.algorithm
net.inet.tcp.cc.algorithm: newreno

Comment 20 Michael Tuexen freebsd_committer

2022-06-06 22:00:34 UTC

(In reply to iron.udjin from comment #19)
OK, thanks for the information.

My current thinking is that the problem might be related to using KTLS in a situation, where the TCP sends a FIN, but still has outstanding data. KTLS is tested a lot with the RACK stack, which does NOT send a FIN if there is outstanding data. This could be the problem with len = 1, since the FIN takes conceptually one byte in the sequence number space, but there is no corresponding byte in the data stream...

Comment 21 Dobri Dobrev 2022-06-10 11:11:07 UTC

(In reply to Michael Tuexen from comment #12)
My workload is also web-related - nginx, grafana, influxdb
No KTLS, so I doubt it's related, and I have the same panic.
Last occurrences: on server #1 - 54 days, on server #2 - 154 days.

Comment 22 iron.udjin 2022-06-10 11:19:24 UTC

(In reply to Dobri Dobrev from comment #21)

Yes, it's most likely not related to KTLS because I expirianced such crashes on 13.0-STABLE with no KTLS enabled (see #258183)

Comment 23 Michael Tuexen freebsd_committer

2022-06-10 12:52:30 UTC

(In reply to iron.udjin from comment #22)
Thanks for providing the information. We are trying to figure out what might be going on...

Comment 24 Richard Scheffenegger freebsd_committer

2022-06-10 22:18:57 UTC

The current thinking is, that SACK rescue retransmissions (in FBSD13 this is gated by net.inet.tcp.rfc6675_pipe=1) very rarely creates an entry, which apparently is beyond the valid data range. 

While under most common circumstances, a final FIN bit in the sequence space is taken care of, it seems that there may be some double-counting for the FIN bit.

In most of the inspected cores, we found:

TCP state: LAST_ACK (FIN received and also FIN sent)
SACK loss recovery triggered
A cumulative ACK before all outstanding data was received
The remote cliet "disappears" for a significant amount of time (7 to 12 retransmission timeouts), but may re-appear again just prior.
snd_max consistently 2 counts above the last data, instead of the expected 1 (for the FIN bit).

However, it is still unclear under what circumstances this double-counting happens, possibly when the persist timer triggers, and a few other conditions are also fulfilled - maybe a race condition between normal packet processing and a timer firing.

In short: disabling rfc6675 enhanced SACK features (more correct pipeline accounting, rescue retransmissions) should address the cause of the panic, while not addressing the root cause of when/why there is the double-accounting of the FIN bit...

Would you be willing to run an intrumented kernel, which either panics (full core dump), or spews out various state, when inconsistencies are detected in this space - while ignoring/addressing them "on the fly" without panicing?

Comment 25 iron.udjin 2022-06-11 16:36:04 UTC

(In reply to Richard Scheffenegger from comment #24)

I can build and run custom kernel. But the problem that I cannot reproduce crash.
The panic has been occur when my server was under DDoS attack. After that it was working almost a month without any problem with net.inet.tcp.rfc6675_pipe=1 and KTLS enabled.

Comment 26 Michael Tuexen freebsd_committer

2022-06-11 17:16:11 UTC

(In reply to iron.udjin from comment #25)
Interesting information. Do you know what the traffic pattern was when being under attack. Any chance you have a .pcap file and are will to share (privately)?

Comment 27 Richard Scheffenegger freebsd_committer

2022-06-11 17:40:23 UTC

(In reply to iron.udjin from comment #25)

I've prepared a patch against main (may need some manual tweaking to apply to 13.1-RELEASE as of now) 
wget https://reviews.freebsd.org/D35446?id=106838&download=true

If the kernel is built with INVARIANTS, it should panic early on, once an inconsistency between the socket sendbuffer and tcp state variables is detected - instead of panicing a few packets later, when that inconsistency results in a invalid pointer access...

If the kernel is built without INVARIANTS, the kernel log buffer (dmesg) should provide some hints as to when/where the inconsistency first occured, which may gve more indirect clues. But it would address the inconsistency right away, and continue operation.

If the panic was observed during a DDOS, this strengthens the clue that there exists a race condition (double-accounting for the FIN bit). However, prior to the introduction of SACK rescue retransmissions, this never materially affected TCP operations, as the socket buffer data would be used directly to see what sequence range to send, rather than the SACK scoreboard data.

Comment 28 iron.udjin 2022-06-11 17:44:21 UTC

(In reply to Michael Tuexen from comment #26)

When my website has started experience problem with overload because of thousands distributed requests like "GET / HTTP/2.0", I switched website to cloudflare and configured filters which was allow the website stay alive. So, there wasn't any direct traffic from clients to the server. Regarding traffic pattern: all traffic was TCP/HTTPS/443 port from cloudflare. Currently there no DDoS attack and no crashes. I guess to make .pcap file now is useless.

Comment 29 iron.udjin 2022-06-11 17:47:14 UTC

(In reply to Richard Scheffenegger from comment #27)

Tell me please which options should be enabled when I'll rebuild kernel. Anything else except INVARIANTS?

Comment 30 Michael Tuexen freebsd_committer

2022-06-11 18:45:03 UTC

(In reply to iron.udjin from comment #29)
In addition, I would also suggest
options TCPPCAP
That would allow us to see the last n packets in/out per endpoint in the kernel dump. Just compiling it in does not enable the logging. If we actually want to do the logging, we need to do something like
sudo syssctl net.inet.tcp.tcp_pcap_packets=10
to store the last 10 packets in each direction. We can see if your system can handle this, since it use mbufs.

Comment 31 iron.udjin 2022-06-11 18:57:40 UTC

(In reply to Richard Scheffenegger from comment #27)

Failed to apply patch for 13.1-RELEASE source code:

|Index: sys/netinet/tcp_sack.c
|===================================================================
|--- sys/netinet/tcp_sack.c
|+++ sys/netinet/tcp_sack.c
--------------------------
Patching file sys/netinet/tcp_sack.c using Plan A...
Hunk #1 succeeded at 879 with fuzz 1 (offset -19 lines).
Hunk #2 failed at 962.
1 out of 2 hunks failed--saving rejects to sys/netinet/tcp_sack.c.rej
done

Comment 32 Richard Scheffenegger freebsd_committer

2022-06-11 20:29:45 UTC

Created attachment 234626 [details]
D35446 patch to apply clean against 13.1-RELEASE

Comment 33 Richard Scheffenegger freebsd_committer

2022-06-11 20:49:57 UTC

Comment on attachment 234626 [details]
D35446 patch to apply clean against 13.1-RELEASE

missed removing the goto label to make the compiler happy.

Comment 34 Richard Scheffenegger freebsd_committer

2022-06-11 20:54:25 UTC

Created attachment 234627 [details]
clean patch against 13.1-RELEASE

Comment 35 Richard Scheffenegger freebsd_committer

2022-06-11 21:00:06 UTC

(In reply to iron.udjin from comment #31)
Try the patch attached just now, which should apply cleanly.

Note that if the hypothesis about snd_max getting infrequently incremented twice holds, you may observe more frequent panics in an INVARIANTS kernel - prior to these added checks, that oddity would not materially affect the tcp session. Only the new feature of SACK rescue retransmissions - which should also happen very infrequently at the same time as a FIN bit - exposed this.

In that case, you may want to switch to a kernel without the INVARIANTS (which would still log those occurances, but not panic), or the default 13.1-RELEASE kernel with rfc6675_pipe disabled (to disable rescue retransmissions) for stable operation.

Comment 36 Christos Chatzaras 2022-06-11 21:17:37 UTC

Today I had a kernel panic in a server. Is it possible the same bug?


Fatal trap 12: page fault while in kernel mode
cpuid = 7; apic id = 07
fault virtual address	= 0x18
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80cae31d
stack pointer	        = 0x28:0xfffffe00e00445f0
frame pointer	        = 0x28:0xfffffe00e0044660
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 0 (if_io_tqg_7)
trap number		= 12
panic: page fault
cpuid = 7
time = 1654981222
KDB: stack backtrace:
#0 0xffffffff80c69465 at kdb_backtrace+0x65
#1 0xffffffff80c1bb1f at vpanic+0x17f
#2 0xffffffff80c1b993 at panic+0x43
#3 0xffffffff810afdf5 at trap_fatal+0x385
#4 0xffffffff810afe4f at trap_pfault+0x4f
#5 0xffffffff81087528 at calltrap+0x8
#6 0xffffffff80de07c9 at tcp_output+0x1339
#7 0xffffffff80dd7eed at tcp_do_segment+0x2cfd
#8 0xffffffff80dd44b1 at tcp_input_with_port+0xb61
#9 0xffffffff80dd515b at tcp_input+0xb
#10 0xffffffff80dc691f at ip_input+0x11f
#11 0xffffffff80d53089 at netisr_dispatch_src+0xb9
#12 0xffffffff80d36ea8 at ether_demux+0x138
#13 0xffffffff80d38235 at ether_nh_input+0x355
#14 0xffffffff80d53089 at netisr_dispatch_src+0xb9
#15 0xffffffff80d372d9 at ether_input+0x69
#16 0xffffffff80ddd9f4 at tcp_lro_flush+0x2f4
#17 0xffffffff80dddd3b at tcp_lro_flush_all+0x1bb
Uptime: 20d1h1m33s
Dumping 2502 out of 32501 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
warning: Source file is more recent than executable.
55		__asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=<optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xffffffff80c1b71c in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:487
#3  0xffffffff80c1bb8e in vpanic (fmt=0xffffffff811b4fb9 "%s", 
    ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:920
#4  0xffffffff80c1b993 in panic (fmt=<unavailable>)
    at /usr/src/sys/kern/kern_shutdown.c:844
#5  0xffffffff810afdf5 in trap_fatal (frame=0xfffffe00e0044530, eva=24)
    at /usr/src/sys/amd64/amd64/trap.c:944
#6  0xffffffff810afe4f in trap_pfault (frame=0xfffffe00e0044530, 
    usermode=false, signo=<optimized out>, ucode=<optimized out>)
    at /usr/src/sys/amd64/amd64/trap.c:763
#7  <signal handler called>
#8  m_copydata (m=0x0, m@entry=0xfffff80405398400, off=0, len=1, 
    cp=<optimized out>) at /usr/src/sys/kern/uipc_mbuf.c:659
#9  0xffffffff80de07c9 in tcp_output (tp=<optimized out>)
    at /usr/src/sys/netinet/tcp_output.c:1081
#10 0xffffffff80dd7eed in tcp_do_segment (m=<optimized out>, 
    th=<optimized out>, so=<optimized out>, tp=0xfffffe013f603000, 
    drop_hdrlen=40, tlen=<optimized out>, iptos=0 '\000')
    at /usr/src/sys/netinet/tcp_input.c:2637
#11 0xffffffff80dd44b1 in tcp_input_with_port (mp=<optimized out>, 
    offp=<optimized out>, proto=<optimized out>, port=port@entry=0)
    at /usr/src/sys/netinet/tcp_input.c:1400
#12 0xffffffff80dd515b in tcp_input (mp=0xfffff80405398400, offp=0x0, proto=1)
    at /usr/src/sys/netinet/tcp_input.c:1496
#13 0xffffffff80dc691f in ip_input (m=0x0)
    at /usr/src/sys/netinet/ip_input.c:839
#14 0xffffffff80d53089 in netisr_dispatch_src (proto=1, 
    source=source@entry=0, m=0xfffff8002f330900)
    at /usr/src/sys/net/netisr.c:1143
#15 0xffffffff80d5345f in netisr_dispatch (proto=87655424, m=0x1)
    at /usr/src/sys/net/netisr.c:1234
#16 0xffffffff80d36ea8 in ether_demux (ifp=ifp@entry=0xfffff80004454000, 
    m=0x0) at /usr/src/sys/net/if_ethersubr.c:921
#17 0xffffffff80d38235 in ether_input_internal (ifp=0xfffff80004454000, m=0x0)
    at /usr/src/sys/net/if_ethersubr.c:707
#18 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:737
#19 0xffffffff80d53089 in netisr_dispatch_src (proto=proto@entry=5, 
    source=source@entry=0, m=m@entry=0xfffff8002f330900)
    at /usr/src/sys/net/netisr.c:1143
#20 0xffffffff80d5345f in netisr_dispatch (proto=87655424, proto@entry=5, 
    m=0x1, m@entry=0xfffff8002f330900) at /usr/src/sys/net/netisr.c:1234
#21 0xffffffff80d372d9 in ether_input (ifp=<optimized out>, 
    m=0xfffff8002f330900) at /usr/src/sys/net/if_ethersubr.c:828
#22 0xffffffff80ddd9f4 in tcp_lro_flush (lc=lc@entry=0xfffff80003cf5830, 
    le=0xfffffe0103f3f690) at /usr/src/sys/netinet/tcp_lro.c:1375
#23 0xffffffff80dddd3b in tcp_lro_rx_done (lc=0xfffff80003cf5830)
    at /usr/src/sys/netinet/tcp_lro.c:566
#24 tcp_lro_flush_all (lc=lc@entry=0xfffff80003cf5830)
    at /usr/src/sys/netinet/tcp_lro.c:1532
#25 0xffffffff80d4f503 in iflib_rxeof (rxq=<optimized out>, 
    rxq@entry=0xfffff80003cf5800, budget=<optimized out>)
    at /usr/src/sys/net/iflib.c:3058
#26 0xffffffff80d49b22 in _task_fn_rx (context=0xfffff80003cf5800)
    at /usr/src/sys/net/iflib.c:3990
#27 0xffffffff80c67e9d in gtaskqueue_run_locked (
    queue=queue@entry=0xfffff80003ac2000)
    at /usr/src/sys/kern/subr_gtaskqueue.c:371
#28 0xffffffff80c67b12 in gtaskqueue_thread_loop (arg=<optimized out>, 
    arg@entry=0xfffffe00387fb0b0) at /usr/src/sys/kern/subr_gtaskqueue.c:547
#29 0xffffffff80bd8a5e in fork_exit (
    callout=0xffffffff80c67a50 <gtaskqueue_thread_loop>, 
    arg=0xfffffe00387fb0b0, frame=0xfffffe00e0044f40)
    at /usr/src/sys/kern/kern_fork.c:1093
#30 <signal handler called>
#31 mi_startup () at /usr/src/sys/kern/init_main.c:322
Backtrace stopped: Cannot access memory at address 0x1d
(kgdb)

Comment 37 Richard Scheffenegger freebsd_committer

2022-06-11 21:34:40 UTC

Can you provide the vmcore (and kernel.debug) files?

Or check 
f 10
p *tp
p *tp->sackhint.nexthole
p tp->snd_una +  tp->t_inpcb->inp_socket->so_snd.sb_ccc
p/x tp->t_flags
p/x tp->t_flags2
p *tp->t_timers

So far, this seems to be the signature:
t_state == 8 (LAST_ACK)
snd_max == snd_una + so_snd.sb_ccc + 2 (! should be one to account for the FIN)
snd_fack < sackhole.end (rescue retransmission - this is new in fbsd13)
t_rxtshift == 6..12 (many retransmission timeouts, indicating the client disappeared - temporarily)
t_rtseq == snd_una + so_snd.sb_ccc
snd_recover == snd_max (indicating the double-accounting for FIN happend prior of entering loss recovery, or "many" packets prior of the actual panic)

t_flags = 0x20100274 (TF_CONGRECOVERY, TF_FASTRECOVERY, TF_SACK_PERMIT, TF_SENTFIN, TF_REQ_SCALE, TF_RCVD_SCALE, TF_NODELAY, TF_ACKNOW)

Comment 38 iron.udjin 2022-06-11 21:41:29 UTC

(In reply to Richard Scheffenegger from comment #35)


I have a problem with kernel rebuild on the linking stage. I just copied GENERIC to KP and added:

ident           DEBUG
options         TCPPCAP
options         WITNESS
options         INVARIANTS

...and got errors:

linking kernel.full
ld: error: undefined symbol: __mtx_assert
>>> referenced by cam_periph.c:351 (/usr/src/sys/cam/cam_periph.c:351)
>>>               cam_periph.o:(cam_periph_find)
>>> referenced by cam_periph.c:444 (/usr/src/sys/cam/cam_periph.c:444)
>>>               cam_periph.o:(cam_periph_release_locked_buses)
>>> referenced by cam_periph.c:678 (/usr/src/sys/cam/cam_periph.c:678)
>>>               cam_periph.o:(cam_periph_release_locked_buses)
>>> referenced 4252 more times

ld: error: undefined symbol: _sx_assert
>>> referenced by filedesc.h:292 (/usr/src/sys/sys/filedesc.h:292)
>>>               freebsd32_capability.o:(freebsd32_cap_ioctls_get)
>>> referenced by acpi_battery.c:465 (/usr/src/sys/dev/acpica/acpi_battery.c:465)
>>>               acpi_battery.o:(acpi_battery_register)
>>> referenced by acpi_cmbat.c:342 (/usr/src/sys/dev/acpica/acpi_cmbat.c:342)
>>>               acpi_cmbat.o:(acpi_cmbat_get_bix)
>>> referenced 590 more times

ld: error: undefined symbol: __rw_assert
>>> referenced by agp.c:613 (/usr/src/sys/dev/agp/agp.c:613)
>>>               agp.o:(agp_generic_bind_memory)
>>> referenced by tmpfs_subr.c:169 (/usr/src/sys/fs/tmpfs/tmpfs_subr.c:169)
>>>               tmpfs_subr.o:(tmpfs_pager_update_writecount)
>>> referenced by tmpfs_subr.c:185 (/usr/src/sys/fs/tmpfs/tmpfs_subr.c:185)
>>>               tmpfs_subr.o:(tmpfs_pager_release_writecount)
>>> referenced 567 more times

ld: error: undefined symbol: _lockmgr_assert
>>> referenced by msdosfs_fat.c:404 (/usr/src/sys/fs/msdosfs/msdosfs_fat.c:404)
>>>               msdosfs_fat.o:(usemap_free)
>>> referenced by msdosfs_fat.c:762 (/usr/src/sys/fs/msdosfs/msdosfs_fat.c:762)
>>>               msdosfs_fat.o:(clusteralloc)
>>> referenced by msdosfs_fat.c:901 (/usr/src/sys/fs/msdosfs/msdosfs_fat.c:901)
>>>               msdosfs_fat.o:(fillinusemap)
>>> referenced 14 more times

ld: error: undefined symbol: _rm_assert
>>> referenced by kern_rmlock.c:120 (/usr/src/sys/kern/kern_rmlock.c:120)
>>>               kern_rmlock.o:(assert_rm)
>>> referenced by kern_rmlock.c:151 (/usr/src/sys/kern/kern_rmlock.c:151)
>>>               kern_rmlock.o:(unlock_rm)
>>> referenced by kern_rmlock.c:322 (/usr/src/sys/kern/kern_rmlock.c:322)
>>>               kern_rmlock.o:(rm_destroy)
>>> referenced 38 more times

ld: error: undefined symbol: _rangelock_cookie_assert
>>> referenced by uipc_shm.c:743 (/usr/src/sys/kern/uipc_shm.c:743)
>>>               uipc_shm.o:(shm_dotruncate_cookie)
>>> referenced by uipc_shm.c:641 (/usr/src/sys/kern/uipc_shm.c:641)
>>>               uipc_shm.o:(shm_dotruncate_locked)
*** [kernel.full] Error code 1

/etc/make.conf: CFLAGS= -O2 -pipe -march=native

Comment 39 Christos Chatzaras 2022-06-11 21:43:14 UTC

(In reply to Richard Scheffenegger from comment #37)

Hello Richard,

Tried to e-mail you at rscheff@freebsd.org but Office 365 block it. Can you e-mail me at chris@cretaforce.gr to send you the link to download it?

Comment 40 Christos Chatzaras 2022-06-11 21:56:43 UTC

(In reply to Richard Scheffenegger from comment #37)

Looks like similar to this which I reported last year:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254725

Comment 41 Richard Scheffenegger freebsd_committer

2022-06-11 22:09:36 UTC

(In reply to Christos Chatzaras from comment #40)

Indeed. Unlike the other cores, the session is still in FIN_WAIT_1 state (6), not LAST_ACK.

Also, there is an entire chain of 1-byte holes in the SACK scoreboard (which got retransmitted. No sign of a rescue retransmission - snd_fack seems to have covered snd_max (the FIN bit?) but there is a gap of 1 byte in the sequence stream:

snd_max is still off from so_snd by 2 instead of the expected 1.

So not quite the same, but maybe the same root cause.

Comment 42 Michael Tuexen freebsd_committer

2022-06-11 22:10:25 UTC

(In reply to Christos Chatzaras from comment #39)
Can you also send the link to tuexen@freebsd.org or, if that is blocked, to tuexen@fh-muenster.de?

Comment 43 Christos Chatzaras 2022-06-11 22:12:57 UTC

(In reply to Michael Tuexen from comment #42)

Links sent. Thank you.

Comment 44 Richard Scheffenegger freebsd_committer

2022-06-11 22:38:06 UTC

(In reply to iron.udjin from comment #38)
Can you also add 

options INVARIANT_SUPPORT

Comment 45 iron.udjin 2022-06-11 22:53:58 UTC

(In reply to Richard Scheffenegger from comment #44)

Ok, kernel has been rebuilded. I'll let you know in case of having something interesting in dmesg.

Comment 46 Michael Tuexen freebsd_committer

2022-06-11 22:59:02 UTC

(In reply to iron.udjin from comment #45)
Can you try something like
sudo syssctl net.inet.tcp.tcp_pcap_packets=10
If your system runs out of mbufs, use less than 10...

Comment 47 iron.udjin 2022-06-11 22:59:38 UTC

(In reply to Michael Tuexen from comment #46)

Yes, I did it already. Thanks.

Comment 48 Michael Tuexen freebsd_committer

2022-06-11 23:01:28 UTC

(In reply to iron.udjin from comment #47)
Perfect!

Comment 49 Dmitriy 2022-06-13 08:14:28 UTC

(In reply to Richard Scheffenegger from comment #37)

Good day to all.
We most likely got into exactly the same problem and backtrace.
Here is some information that we hope will be of some help.
If there is anything else we can help with, please just let me know.

13.1-STABLE FreeBSD 13.1-STABLE #0 stable/13-n251001-41ce229505a: Sat Jun 4 19:47:50 EEST 2022

kgdb /usr/lib/debug/boot/kernel/kernel.debug /var/crash/vmcore.0
...
Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 04
fault virtual address   = 0x18
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff806e5569
stack pointer           = 0x28:0xfffffe027ac53690
frame pointer           = 0x28:0xfffffe027ac53700
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (if_io_tqg_2)
trap number             = 12
panic: page fault
cpuid = 2
time = 1655099122
KDB: stack backtrace:
#0 0xffffffff806a0cc5 at kdb_backtrace+0x65
#1 0xffffffff80657a0f at vpanic+0x17f
#2 0xffffffff80657883 at panic+0x43
#3 0xffffffff80a03837 at trap_fatal+0x387
#4 0xffffffff80a0388f at trap_pfault+0x4f
#5 0xffffffff809dcbe8 at calltrap+0x8
#6 0xffffffff807cbee9 at tcp_output+0x1329
#7 0xffffffff807c332b at tcp_do_segment+0x29db
#8 0xffffffff807bfc21 at tcp_input_with_port+0xb61
#9 0xffffffff807c08bb at tcp_input+0xb
#10 0xffffffff807b2118 at ip_input+0x118
#11 0xffffffff807890b9 at netisr_dispatch_src+0xb9
#12 0xffffffff8076d554 at ether_demux+0x144
#13 0xffffffff8076e8b6 at ether_nh_input+0x346
#14 0xffffffff807890b9 at netisr_dispatch_src+0xb9
#15 0xffffffff8076d979 at ether_input+0x69
#16 0xffffffff80780bf1 at _task_fn_rx+0xc31
#17 0xffffffff8069f6dd at gtaskqueue_run_locked+0x15d
Uptime: 8d12h42m8s
Dumping 15966 out of 262007 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) f 10
#10 0xffffffff807c332b in tcp_do_segment (m=0xfffff8020c425400, th=0xfffff8020c42547a, so=0xfffff8179dcc1760, tp=0xfffffe09a5da7950, drop_hdrlen=41,
    tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2637
2637                                            (void) tp->t_fb->tfb_tcp_output(tp);
(kgdb) p *tp
$12 = {t_inpcb = 0xfffff81612d9f1f0, t_fb = 0xffffffff80ebe670 <tcp_def_funcblk>, t_fb_ptr = 0x0, t_maxseg = 1440, t_logstate = 0, t_port = 0, t_state = 8, t_idle_reduce = 0, t_delayed_ack = 0, t_fin_is_rst = 0, t_log_state_set = 0, 
  bits_spare = 0, t_flags = 537920116, snd_una = 893181596, snd_max = 893186736, snd_nxt = 893186736, snd_up = 893181596, snd_wnd = 66240, snd_cwnd = 7996, t_peakrate_thr = 0, ts_offset = 0, rfbuf_ts = 736745954, rcv_numsacks = 0, 
  t_tsomax = 0, t_tsomaxsegcount = 0, t_tsomaxsegsize = 0, rcv_nxt = 2834403673, rcv_adv = 2834408281, rcv_wnd = 4608, t_flags2 = 1026, t_srtt = 0, t_rttvar = 4000, ts_recent = 0, snd_scale = 2 '\002', rcv_scale = 9 '\t', 
  snd_limited = 0 '\000', request_r_scale = 9 '\t', last_ack_sent = 2834403673, t_rcvtime = 2883811337, rcv_up = 2834403673, t_segqlen = 0, t_segqmbuflen = 0, t_segq = {tqh_first = 0x0, tqh_last = 0xfffffe09a5da79e0}, t_in_pkt = 0x0, 
  t_tail_pkt = 0x0, t_timers = 0xfffffe09a5da7bf8, t_vnet = 0xfffff801016b1380, snd_ssthresh = 2880, snd_wl1 = 2834403673, snd_wl2 = 893181596, irs = 2834403154, iss = 893181595, t_acktime = 0, t_sndtime = 2883799567, ts_recent_age = 0, 
  snd_recover = 893186735, cl4_spare = 0, t_oobflags = 0 '\000', t_iobc = 0 '\000', t_rxtcur = 64000, t_rxtshift = 7, t_rtttime = 2883799567, t_rtseq = 893186734, t_starttime = 2883628642, t_fbyte_in = 2883628647, 
  t_fbyte_out = 2883628648, t_pmtud_saved_maxseg = 0, t_blackhole_enter = 0, t_blackhole_exit = 0, t_rttmin = 30, t_rttbest = 0, t_softerror = 0, max_sndwnd = 66240, snd_cwnd_prev = 23040, snd_ssthresh_prev = 1073725440, 
  snd_recover_prev = 893181595, t_sndzerowin = 0, t_rttupdated = 0, snd_numholes = 3, t_badrxtwin = 2883629648, snd_holes = {tqh_first = 0xfffff808705bae80, tqh_last = 0xfffff8037d9e6110}, snd_fack = 893186735, sackblks = {{
      start = 2834403672, end = 2834403673}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}}, sackhint = {nexthole = 0xfffff8037d9e6100, sack_bytes_rexmit = 4250, 
    last_sack_ack = 0, delivered_data = 0, sacked_bytes = 888, recover_fs = 5138, prr_delivered = 6636, prr_out = 20102}, t_rttlow = 0, rfbuf_cnt = 517, tod = 0x0, t_sndrexmitpack = 85, t_rcvoopack = 0, t_toe = 0x0, 
  cc_algo = 0xffffffff80ebb1b0 <newreno_cc_algo>, ccv = 0xfffffe09a5da7d40, osd = 0xfffffe09a5da7d68, t_bytes_acked = 0, t_maxunacktime = 0, t_keepinit = 0, t_keepidle = 0, t_keepintvl = 0, t_keepcnt = 0, t_dupacks = 52, t_lognum = 0, 
  t_loglimit = 5000, t_pacing_rate = -1, t_logs = {stqh_first = 0x0, stqh_last = 0xfffffe09a5da7b68}, t_lin = 0x0, t_lib = 0x0, t_output_caller = 0x0, t_stats = 0x0, t_logsn = 0, gput_ts = 0, gput_seq = 0, gput_ack = 0, 
  t_stats_gput_prev = 0, t_maxpeakrate = 0, t_sndtlppack = 0, t_sndtlpbyte = 0, t_sndbytes = 13753, t_snd_rxt_bytes = 37794, t_tfo_client_cookie_len = 0 '\000', t_end_info_status = 0, t_tfo_pending = 0x0, t_tfo_cookie = {
    client = '\000' <repeats 15 times>, server = 0}, {t_end_info_bytes = "\000\000\000\000\000\000\000", t_end_info = 0}}
(kgdb) p *tp->sackhint.nexthole
$14 = {start = 893186733, end = 893186734, rxmit = 893186733, scblink = {tqe_next = 0x0, tqe_prev = 0xfffff80870696d90}}
(kgdb) p tp->snd_una +  tp->t_inpcb->inp_socket->so_snd.sb_ccc
$15 = 893186733
(kgdb) p/x tp->t_flags
$16 = 0x20100274
(kgdb) p/x tp->t_flags2
$17 = 0x402
(kgdb) p *tp->t_timers
$18 = {tt_rexmt = {c_links = {le = {le_next = 0xfffffe03c8712b18, le_prev = 0xfffffe01a40f36f8}, sle = {sle_next = 0xfffffe03c8712b18}, tqe = {tqe_next = 0xfffffe03c8712b18, tqe_prev = 0xfffffe01a40f36f8}}, c_time = 3165308754918543, 
    c_precision = 17179868000, c_arg = 0xfffffe09a5da7950, c_func = 0xffffffff807dde30 <tcp_timer_rexmt>, c_lock = 0x0, c_flags = 2, c_iflags = 20, c_cpu = 0}, tt_persist = {c_links = {le = {le_next = 0x0, le_prev = 0x0}, sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_precision = 0, c_arg = 0x0, c_func = 0x0, c_lock = 0x0, c_flags = 0, c_iflags = 16, c_cpu = 0}, tt_keep = {c_links = {le = {le_next = 0xfffffe0cfc422408, 
        le_prev = 0xfffffe05c8f73328}, sle = {sle_next = 0xfffffe0cfc422408}, tqe = {tqe_next = 0xfffffe0cfc422408, tqe_prev = 0xfffffe05c8f73328}}, c_time = 3195223522196068, c_precision = 1932735150000, c_arg = 0xfffffe09a5da7950, 
    c_func = 0xffffffff807dd730 <tcp_timer_keep>, c_lock = 0x0, c_flags = 2, c_iflags = 20, c_cpu = 0}, tt_2msl = {c_links = {le = {le_next = 0x0, le_prev = 0x0}, sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, 
    c_time = 0, c_precision = 0, c_arg = 0x0, c_func = 0x0, c_lock = 0x0, c_flags = 0, c_iflags = 16, c_cpu = 0}, tt_delack = {c_links = {le = {le_next = 0x0, le_prev = 0x0}, sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, 
        tqe_prev = 0x0}}, c_time = 0, c_precision = 0, c_arg = 0x0, c_func = 0x0, c_lock = 0x0, c_flags = 0, c_iflags = 16, c_cpu = 0}, tt_flags = 0, tt_draincnt = 0}
(kgdb)

Comment 50 Dmitriy 2022-06-14 15:34:43 UTC

After applying the patch comment#34
and with
options INVARIANTS
options INVARIANT_SUPPORT
in kernel, system goin to panic in 5-40 minutes (tried 3 times, all the same place), with following trace:

Unread portion of the kernel message buffer:
panic: tcp_output: snd_max beyond so_snd
cpuid = 12
time = 1655213044
KDB: stack backtrace:
#0 0xffffffff808d8f01 at kdb_backtrace+0x71
#1 0xffffffff8086f797 at vpanic+0x227
#2 0xffffffff8086f2be at panic+0x4e
#3 0xffffffff80ab3551 at tcp_output+0x32a1
#4 0xffffffff80aa2722 at tcp_do_segment+0x2e72
#5 0xffffffff80a9ec35 at tcp_input_with_port+0x1be5
#6 0xffffffff80a9f777 at tcp_input+0x27
#7 0xffffffff80a87061 at ip_input+0xdd1
#8 0xffffffff80a4023f at netisr_dispatch_src+0x1df
#9 0xffffffff80a407a1 at netisr_dispatch+0x21
#10 0xffffffff80a11266 at ether_demux+0x306
#11 0xffffffff80a13c10 at ether_input_internal+0x9e0
#12 0xffffffff80a13221 at ether_nh_input+0xb1
#13 0xffffffff80a4023f at netisr_dispatch_src+0x1df
#14 0xffffffff80a407a1 at netisr_dispatch+0x21
#15 0xffffffff80a11b09 at ether_input+0x1a9
#16 0xffffffff80a3a925 at iflib_rxeof+0x895
#17 0xffffffff80a2e4e5 at _task_fn_rx+0xd5
Uptime: 43m43s
Dumping 9369 out of 261999 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:399
399    dumptid = curthread->td_tid;
(kgdb) bt
#0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:399
#1  0xffffffff8086efd3 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487
#2  0xffffffff8086f84f in vpanic (fmt=0xffffffff80f390c8 "%s: snd_max beyond so_snd", ap=0xfffffe027ac92320) at /usr/src/sys/kern/kern_shutdown.c:920
#3  0xffffffff8086f2be in panic (fmt=0xffffffff80f390c8 "%s: snd_max beyond so_snd") at /usr/src/sys/kern/kern_shutdown.c:844
#4  0xffffffff80ab3551 in tcp_output (tp=0xfffffe04709abca8) at /usr/src/sys/netinet/tcp_output.c:1583
#5  0xffffffff80aa2722 in tcp_do_segment (m=0xfffff801ef8be500, th=0xfffff801ef8be57a, so=0xfffff8061cdc8b10, tp=0xfffffe04709abca8, drop_hdrlen=41, tlen=0, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2713
#6  0xffffffff80a9ec35 in tcp_input_with_port (mp=0xfffffe027ac929c8, offp=0xfffffe027ac92968, proto=6, port=0) at /usr/src/sys/netinet/tcp_input.c:1400
#7  0xffffffff80a9f777 in tcp_input (mp=0xfffffe027ac929c8, offp=0xfffffe027ac92968, proto=6) at /usr/src/sys/netinet/tcp_input.c:1496
#8  0xffffffff80a87061 in ip_input (m=0x0) at /usr/src/sys/netinet/ip_input.c:839
#9  0xffffffff80a4023f in netisr_dispatch_src (proto=1, source=0, m=0xfffff801ef8be500) at /usr/src/sys/net/netisr.c:1143
#10 0xffffffff80a407a1 in netisr_dispatch (proto=1, m=0xfffff801ef8be500) at /usr/src/sys/net/netisr.c:1234
#11 0xffffffff80a11266 in ether_demux (ifp=0xfffff820816e3800, m=0xfffff801ef8be500) at /usr/src/sys/net/if_ethersubr.c:921
#12 0xffffffff80a13c10 in ether_input_internal (ifp=0xfffff820816e3800, m=0xfffff801ef8be500) at /usr/src/sys/net/if_ethersubr.c:707
#13 0xffffffff80a13221 in ether_nh_input (m=0xfffff801ef8be500) at /usr/src/sys/net/if_ethersubr.c:737
#14 0xffffffff80a4023f in netisr_dispatch_src (proto=5, source=0, m=0xfffff801ef8be500) at /usr/src/sys/net/netisr.c:1143
#15 0xffffffff80a407a1 in netisr_dispatch (proto=5, m=0xfffff801ef8be500) at /usr/src/sys/net/netisr.c:1234
#16 0xffffffff80a11b09 in ether_input (ifp=0xfffff8010650a000, m=0xfffff801ef8be500) at /usr/src/sys/net/if_ethersubr.c:828
#17 0xffffffff80a3a925 in iflib_rxeof (rxq=0xfffffe01b7551080, budget=16) at /usr/src/sys/net/iflib.c:3047
#18 0xffffffff80a2e4e5 in _task_fn_rx (context=0xfffffe01b7551080) at /usr/src/sys/net/iflib.c:3990
#19 0xffffffff808d7427 in gtaskqueue_run_locked (queue=0xfffff80104d7e200) at /usr/src/sys/kern/subr_gtaskqueue.c:371
#20 0xffffffff808d6fad in gtaskqueue_thread_loop (arg=0xfffffe01b71a7128) at /usr/src/sys/kern/subr_gtaskqueue.c:547
#21 0xffffffff808053f2 in fork_exit (callout=0xffffffff808d6f00 <gtaskqueue_thread_loop>, arg=0xfffffe01b71a7128, frame=0xfffffe027ac92f40) at /usr/src/sys/kern/kern_fork.c:1093
#22 <signal handler called>
#23 0xffffffff8129ea18 in periodic_resettodr_sys_init ()
Backtrace stopped: Cannot access memory at address 0x0
(kgdb) fr 4
#4  0xffffffff80ab3551 in tcp_output (tp=0xfffffe04709abca8) at /usr/src/sys/netinet/tcp_output.c:1583


1583        KASSERT(SEQ_LEQ(tp->snd_max, top+1),
(kgdb) p tp->snd_max
$1 = 3141897257
(kgdb) p top
$2 = 3141897255
(kgdb)

No KTLS enabled\used. Adapter is Intel X710 (if_ixl).
If there is anything else we can help with, please just let me know.

Comment 51 Richard Scheffenegger freebsd_committer

2022-06-14 17:29:21 UTC

Thanks a lot!

Can you provide that core + kernel.debug files?

I've extended the logging in a revised patch, if this may be more easy.

Comment 52 Richard Scheffenegger freebsd_committer

2022-06-14 17:30:39 UTC

Created attachment 234683 [details]
more logging

extended logging (w/ panic/KASSERT)

Comment 53 Dmitriy 2022-06-14 19:35:43 UTC

(In reply to Richard Scheffenegger from comment #51)

Sent the link with in e-mail to: rscheff@freebsd.org tuexen@freebsd.org

Comment 54 Michael Tuexen freebsd_committer

2022-06-14 22:51:19 UTC

(In reply to Dmitriy from comment #53)
Thanks. Did you enable
options TCPPCAP
in the kernel config? It looks like that it is not enabled...

Comment 55 Richard Scheffenegger freebsd_committer

2022-06-15 04:55:42 UTC

Thanks for the core. For stable operation, please use an unpatched kernel, and without net.inet.tcp.rfc6675_pipe=0

The patched cores confirm that during the very final phases, the stack increments one of the variables multiple times for the FIN bit, when it should only have incremented it once. Together with SACK rescue retransmissions (enabled by above sysctl) this leads to an attempt to send out non-existing data.

the TCPPCAP and sudo syssctl net.inet.tcp.tcp_pcap_packets=10 would retain the last 10 packets of every session in-memory, in order to create a reproduction script, as the timing (relativ to retransmission timeouts, persist and keepalive timers) of the clients packets apparently has something to do with this problem.

Comment 56 Dmitriy 2022-06-15 08:30:56 UTC

(In reply to Michael Tuexen from comment #54)

You are right, the option was not enabled.
Added these settings and sent new core files in a e-mail to: rscheff@freebsd.org tuexen@freebsd.org
Thank you!

Comment 57 Christos Chatzaras 2022-06-15 08:43:55 UTC

(In reply to Richard Scheffenegger from comment #55)

You mean that he has to use "net.inet.tcp.rfc6675_pipe=0" for stable operartion, right?

Comment 58 Dmitriy 2022-06-15 09:11:18 UTC

(In reply to Christos Chatzaras from comment #57)
But "net.inet.tcp.rfc6675_pipe: 0"
it was already "0" and never was changed. It remains "0" value when crash occurs.

Comment 59 Christos Chatzaras 2022-06-15 09:25:41 UTC

(In reply to Dmitriy from comment #58)

Yes "0" is the default. So the only workaround is to disable SACK?

Comment 60 Richard Scheffenegger freebsd_committer

2022-06-15 10:05:35 UTC

(In reply to Christos Chatzaras from comment #57)

Sorry for being unclear:

A plain (unpatched) 13.1-RELEASE kernel with net.inet.tcp.rfc6675_pipe=0 would not exhibit a panic.

The patches added here really are to gather more information - and deliberately panic (when built with INVARIANTS) or retroactively adjust the incorrect TCP state.

Stable operation in a production environment would only need the standard kernel, and it's safe to use SACK as long as rfc6675_pipe is not enabled.

(The root cause - apparent double-accounting for FIN bits, and possibly even sending FIN bits at two different, final sequence numbers - would still be present, but not cause a crash).

Comment 61 Michael Tuexen freebsd_committer

2022-06-15 12:25:27 UTC

(In reply to Dmitriy from comment #56)
Thank you! I'm trying to recreate the problem locally using packetdrill.

Comment 62 Michael Tuexen freebsd_committer

2022-06-15 14:50:44 UTC

(In reply to Dmitriy from comment #56)
I can confirm that
options TCPPCAP
was enabled in the kernel. However, it seems you did not execute
sudo syssctl net.inet.tcp.tcp_pcap_packets=10
at least that would explain why no packets were captured.
Can you double check? More than 10 would be useful (like 20 or 30), but I don't know if that would work for your workload.

Comment 63 Dmitriy 2022-06-15 16:47:45 UTC

(In reply to Michael Tuexen from comment #62)

Option was added to sysctl.conf, system rebooted, after reboot it definetly was checked:
sysctl net.inet.tcp.tcp_pcap_packets
net.inet.tcp.tcp_pcap_packets: 10

Going to set net.inet.tcp.tcp_pcap_packets=30 for sure. Waiting for new core.

Offtop: meantime kernel was rebilt without INVARIANTS to continue running without panicing, but logging. Hope results would be helpful too:

Jun 15 13:55:02 hostname kernel: tcp_output#1587: snd_max 1422777791 > so_snd+1 1422777790 adjusting.
Jun 15 13:55:47 hostname kernel: tcp_output#1587: snd_max 1422777792 > so_snd+1 1422777790 adjusting.
Jun 15 14:13:29 hostname kernel: tcp_output#1587: snd_max 75162767 > so_snd+1 75162766 adjusting.
Jun 15 14:14:15 hostname kernel: tcp_output#1587: snd_max 75162768 > so_snd+1 75162766 adjusting.
Jun 15 15:06:32 hostname kernel: tcp_output#1587: snd_max 2137549205 > so_snd+1 2137549204 adjusting.
Jun 15 15:15:53 hostname kernel: tcp_output#1587: snd_max 2960832867 > so_snd+1 2960832866 adjusting.
Jun 15 15:16:32 hostname kernel: tcp_output#1587: snd_max 1274542388 > so_snd+1 1274542387 adjusting.
Jun 15 15:16:32 hostname kernel: tcp_output#1587: snd_max 1274542389 > so_snd+1 1274542387 adjusting.
Jun 15 15:30:31 hostname kernel: tcp_output#1587: snd_max 2283463051 > so_snd+1 2283463050 adjusting.
Jun 15 15:35:43 hostname kernel: tcp_output#1587: snd_max 336646110 > so_snd+1 336646109 adjusting.
Jun 15 15:38:35 hostname kernel: tcp_output#1587: snd_max 1227541195 > so_snd+1 1227541194 adjusting.
Jun 15 15:38:35 hostname kernel: tcp_output#1587: snd_max 1227541196 > so_snd+1 1227541194 adjusting.
Jun 15 16:56:46 hostname kernel: tcp_output#1587: snd_max 382752252 > so_snd+1 382752251 adjusting.
Jun 15 16:56:46 hostname kernel: tcp_output#1587: snd_max 382752253 > so_snd+1 382752251 adjusting.
Jun 15 17:08:42 hostname kernel: tcp_output#1587: snd_max 359933951 > so_snd+1 359933950 adjusting.
Jun 15 17:08:42 hostname kernel: tcp_output#1587: snd_max 359933952 > so_snd+1 359933950 adjusting.
Jun 15 17:09:31 hostname kernel: tcp_output#1587: snd_max 205782007 > so_snd+1 205782006 adjusting.
Jun 15 17:09:41 hostname kernel: tcp_output#1587: snd_max 205782008 > so_snd+1 205782006 adjusting.
Jun 15 17:09:53 hostname kernel: tcp_output#1587: snd_max 205782009 > so_snd+1 205782006 adjusting.
Jun 15 17:10:03 hostname kernel: tcp_output#1587: snd_max 205782010 > so_snd+1 205782006 adjusting.
Jun 15 17:22:07 hostname kernel: tcp_output#1587: snd_max 4147118877 > so_snd+1 4147118876 adjusting.
Jun 15 17:36:46 hostname kernel: tcp_output#1587: snd_max 494998134 > so_snd+1 494998133 adjusting.
Jun 15 18:27:14 hostname kernel: tcp_output#1587: snd_max 2387061338 > so_snd+1 2387061337 adjusting.
Jun 15 19:22:28 hostname kernel: tcp_output#1587: snd_max 725541353 > so_snd+1 725541352 adjusting.
Jun 15 19:23:53 hostname kernel: tcp_output#1587: snd_max 607018670 > so_snd+1 607018669 adjusting.
Jun 15 19:24:03 hostname kernel: tcp_output#1587: snd_max 607018671 > so_snd+1 607018669 adjusting.
Jun 15 19:24:14 hostname kernel: tcp_output#1587: snd_max 607018672 > so_snd+1 607018669 adjusting.
Jun 15 19:24:24 hostname kernel: tcp_output#1587: snd_max 607018673 > so_snd+1 607018669 adjusting.
Jun 15 19:24:34 hostname kernel: tcp_output#1587: snd_max 607018674 > so_snd+1 607018669 adjusting.
Jun 15 19:24:44 hostname kernel: tcp_output#1587: snd_max 607018675 > so_snd+1 607018669 adjusting.

Comment 64 Michael Tuexen freebsd_committer

2022-06-15 20:58:50 UTC

Created attachment 234709 [details]
Fix TCPPCAP

Comment 65 Michael Tuexen freebsd_committer

2022-06-15 21:02:55 UTC

(In reply to Dmitriy from comment #63)
OK, I tested it locally. To get TCPPCAP working you need
to apply the patch I just added with the name "Fix TCPPCAP"

Can you apply that patch, build and install an updated kernel and provide a kernel dump after that patch has been applied?

Comment 66 Dmitriy 2022-06-16 08:16:14 UTC

(In reply to Michael Tuexen from comment #65)
Done. Please find the link with new corefile in e-mail.
Thank you!

Comment 67 Michael Tuexen freebsd_committer

2022-06-16 08:33:12 UTC

(In reply to Dmitriy from comment #66)
Thank you very much. The new core file contains the packets handled by the end point.

Comment 68 Christos Chatzaras 2022-06-20 23:49:32 UTC

Today I had another crash in a different server.

Fatal trap 12: page fault while in kernel mode
cpuid = 7; apic id = 07
fault virtual address   = 0x18
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80cae31d
stack pointer           = 0x28:0xfffffe01141445c0
frame pointer           = 0x28:0xfffffe0114144630
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (if_io_tqg_7)
trap number             = 12
panic: page fault
cpuid = 7
time = 1655767279
KDB: stack backtrace:
#0 0xffffffff80c69465 at kdb_backtrace+0x65
#1 0xffffffff80c1bb1f at vpanic+0x17f
#2 0xffffffff80c1b993 at panic+0x43
#3 0xffffffff810afdf5 at trap_fatal+0x385
#4 0xffffffff810afe4f at trap_pfault+0x4f
#5 0xffffffff81087528 at calltrap+0x8
#6 0xffffffff80de07c9 at tcp_output+0x1339
#7 0xffffffff80dd7eed at tcp_do_segment+0x2cfd
#8 0xffffffff80dd44b1 at tcp_input_with_port+0xb61
#9 0xffffffff80dd515b at tcp_input+0xb
#10 0xffffffff80dc691f at ip_input+0x11f
#11 0xffffffff80d53089 at netisr_dispatch_src+0xb9
#12 0xffffffff80d36ea8 at ether_demux+0x138
#13 0xffffffff80d38235 at ether_nh_input+0x355
#14 0xffffffff80d53089 at netisr_dispatch_src+0xb9
#15 0xffffffff80d372d9 at ether_input+0x69
#16 0xffffffff80ddeaa5 at tcp_push_and_replace+0x25
#17 0xffffffff80ddd74c at tcp_lro_flush+0x4c
Uptime: 29d3h36m11s
Dumping 4275 out of 65278 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
warning: Source file is more recent than executable.
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=<optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xffffffff80c1b71c in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:487
#3  0xffffffff80c1bb8e in vpanic (fmt=0xffffffff811b4fb9 "%s",
    ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:920
#4  0xffffffff80c1b993 in panic (fmt=<unavailable>)
    at /usr/src/sys/kern/kern_shutdown.c:844
#5  0xffffffff810afdf5 in trap_fatal (frame=0xfffffe0114144500, eva=24)
    at /usr/src/sys/amd64/amd64/trap.c:944
#6  0xffffffff810afe4f in trap_pfault (frame=0xfffffe0114144500,
    usermode=false, signo=<optimized out>, ucode=<optimized out>)
    at /usr/src/sys/amd64/amd64/trap.c:763
#7  <signal handler called>
#8  m_copydata (m=0x0, m@entry=0xfffff80c219ce500, off=0, len=1,
    cp=<optimized out>) at /usr/src/sys/kern/uipc_mbuf.c:659
#9  0xffffffff80de07c9 in tcp_output (tp=<optimized out>)
    at /usr/src/sys/netinet/tcp_output.c:1081
#10 0xffffffff80dd7eed in tcp_do_segment (m=<optimized out>,
    th=<optimized out>, so=<optimized out>, tp=0xfffffe01990a1000,
    drop_hdrlen=64, tlen=<optimized out>, iptos=0 '\000')
    at /usr/src/sys/netinet/tcp_input.c:2637
#11 0xffffffff80dd44b1 in tcp_input_with_port (mp=<optimized out>,
    offp=<optimized out>, proto=<optimized out>, port=port@entry=0)
    at /usr/src/sys/netinet/tcp_input.c:1400
#12 0xffffffff80dd515b in tcp_input (mp=0xfffff80c219ce500, offp=0x0, proto=1)
    at /usr/src/sys/netinet/tcp_input.c:1496
#13 0xffffffff80dc691f in ip_input (m=0x0)
    at /usr/src/sys/netinet/ip_input.c:839
#14 0xffffffff80d53089 in netisr_dispatch_src (proto=1,
    source=source@entry=0, m=0xfffff80e00395400)
    at /usr/src/sys/net/netisr.c:1143
#15 0xffffffff80d5345f in netisr_dispatch (proto=563930368, m=0x1)
    at /usr/src/sys/net/netisr.c:1234
#16 0xffffffff80d36ea8 in ether_demux (ifp=ifp@entry=0xfffff80004659000,
    m=0x0) at /usr/src/sys/net/if_ethersubr.c:921
#17 0xffffffff80d38235 in ether_input_internal (ifp=0xfffff80004659000, m=0x0)
    at /usr/src/sys/net/if_ethersubr.c:707
#18 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:737
#19 0xffffffff80d53089 in netisr_dispatch_src (proto=proto@entry=5,
    source=source@entry=0, m=m@entry=0xfffff80e00395400)
    at /usr/src/sys/net/netisr.c:1143
#20 0xffffffff80d5345f in netisr_dispatch (proto=563930368, proto@entry=5,
    m=0x1, m@entry=0xfffff80e00395400) at /usr/src/sys/net/netisr.c:1234
#21 0xffffffff80d372d9 in ether_input (ifp=<optimized out>,
    m=0xfffff80e00395400) at /usr/src/sys/net/if_ethersubr.c:828
#22 0xffffffff80ddeaa5 in tcp_push_and_replace (lc=0xfffff80c219ce500,
    lc@entry=0xfffff80003ef2830, le=le@entry=0xfffffe0158387690,
    m=m@entry=0xfffff80f2b178300) at /usr/src/sys/netinet/tcp_lro.c:923
#23 0xffffffff80ddd74c in tcp_lro_condense (lc=0xfffff80003ef2830,
    le=0xfffffe0158387690) at /usr/src/sys/netinet/tcp_lro.c:1011
#24 tcp_lro_flush (lc=lc@entry=0xfffff80003ef2830, le=0xfffffe0158387690)
    at /usr/src/sys/netinet/tcp_lro.c:1374
#25 0xffffffff80dddd3b in tcp_lro_rx_done (lc=0xfffff80003ef2830)
    at /usr/src/sys/netinet/tcp_lro.c:566
#26 tcp_lro_flush_all (lc=lc@entry=0xfffff80003ef2830)
    at /usr/src/sys/netinet/tcp_lro.c:1532
#27 0xffffffff80d4f503 in iflib_rxeof (rxq=<optimized out>,
    rxq@entry=0xfffff80003ef2800, budget=<optimized out>)
    at /usr/src/sys/net/iflib.c:3058
#28 0xffffffff80d49b22 in _task_fn_rx (context=0xfffff80003ef2800)
    at /usr/src/sys/net/iflib.c:3990
#29 0xffffffff80c67e9d in gtaskqueue_run_locked (
    queue=queue@entry=0xfffff80003cbf000)
    at /usr/src/sys/kern/subr_gtaskqueue.c:371
#30 0xffffffff80c67b12 in gtaskqueue_thread_loop (arg=<optimized out>,
    arg@entry=0xfffffe01142820b0) at /usr/src/sys/kern/subr_gtaskqueue.c:547
#31 0xffffffff80bd8a5e in fork_exit (
    callout=0xffffffff80c67a50 <gtaskqueue_thread_loop>,
    arg=0xfffffe01142820b0, frame=0xfffffe0114144f40)
    at /usr/src/sys/kern/kern_fork.c:1093
#32 <signal handler called>
#33 mi_startup () at /usr/src/sys/kern/init_main.c:322
Backtrace stopped: Cannot access memory at address 0x1d
(kgdb)

Comment 69 Richard Scheffenegger freebsd_committer

2022-06-21 06:20:18 UTC

Are you using net.inet.tcp.rfc6675_pipe=1? While we are investigating the root cause for snd_max to grow beyond what it is supposed to ever grow to, in the meantime disabling sack rescue retransmissions should prevent these panics.

(Note: even though we have one core with the (effectively) full packet trace, the misbehavior could not yet be reproduced properly.

Comment 70 Richard Scheffenegger freebsd_committer

2022-06-21 08:23:52 UTC

(In reply to Christos Chatzaras from comment #68)

From the provided core:

snd_una: 2568161219
snd_nxt: 2568161624
snd_max: 2568161624
snd_fack: 2568161624 (!!)

so_snd: 403

snd_una + so_snd: 2568161622

 *tp->sackhint->nexthole
{start = 2568161219, end = 2568161623, rxmit = 2568161622, ...

This indicates, that the remote client SACKed the 2nd FIN bit (which apparently was sent at sequence number 2568161624, not 2568161623 as expected).

The SACK for this 2nd FIN created a SACK scoreboard hole, which included the sequence number of the 1st FIN bit - which does not exist in the socket sendbuffer. Thus when trying to retransmit this non-existing byte, the panic happens.

Comment 71 Richard Scheffenegger freebsd_committer

2022-06-21 08:42:56 UTC

Created attachment 234833 [details]
symptomatic logging patch

This patch symptomatically addresses when snd_max grows beyond the FIN bit, without panicing. Also, a few tidbits of data get logged to dmesg.

Comment 72 Christos Chatzaras 2022-06-21 13:45:01 UTC

(In reply to Richard Scheffenegger from comment #69)

1) net.inet.tcp.rfc6675_pipe is disabled.

2) I sent the vmcore + debug symbols to your e-mail.

3) I temporary disable sack.

Comment 73 Christos Chatzaras 2022-06-24 14:22:44 UTC

I see that iron.udjin loader.conf has wireguard enabled.

I use wireguard too.

Is this possible related to wireguard?

Comment 74 Richard Scheffenegger freebsd_committer

2022-06-24 14:35:21 UTC

The offending TCP sessions we've seen so far all seem to be regular https sessions. For (yet unknown) reasons, rarely the FIN bit seems to get accounted more than once - up to 6 times, from one of the logging patched kernels.

With SACK rescue retransmissions, that can lead to having the sequence number of one or more FIN bits included in the block which is to be retransmitted - and the panic happens, when the data byte at the sequence number of the FIN bit is tried to be retranmitted...

In your case, even though you didn't have SACK rescue retransmissions enabled, the client of the offending session appears to have SACK'ed the 2nd retransmission of the (and empty) FIN packet with a "too high" sequence number, in effect resulting in the same issue. This should be happening much less frequently than with the SACK rescue retransmissions.

At the same time, it appears, that double-counting of FIN bits happens quite frequently, but is not easy to reproduce. Thus we are currently working on a patch which exposes this (in INVARIANTS kernels with panics, in production kernels by logging that unexpected state, and clearing it "on the fly").

See D35446

Comment 75 iron.udjin 2022-06-24 17:48:41 UTC

(In reply to Christos Chatzaras from comment #73)

I don't think that it's somehow related to wireguard because I had similar panics on the server which is doesn't have wireguard installed/used.

Comment 76 Michael Tuexen freebsd_committer

2022-06-24 18:29:58 UTC

(In reply to iron.udjin from comment #75)
Thanks for the data pointer. I guess, whatever modifications a tunnel or link layer does unintentionally would look like packet corruption to TCP and should be handled fine. I also guess such issue would have come up on a much broader scale.

Comment 77 Christos Chatzaras 2022-06-29 21:37:04 UTC

(In reply to Richard Scheffenegger from comment #70)

I found this old review which I believe is not committed: 

https://reviews.freebsd.org/D2970

Does this try to fix my panic?

Comment 78 Michael Tuexen freebsd_committer

2022-07-11 08:51:09 UTC

From the pcap information we were able to reconstruct the packet sequence, but not the exact timing. Unfortunately, we were not able to reproduce the issue. However, it seems hard, to extend the pcap logging.

So I suggest to use the BlackBox Logging instead.

Could you:

* Add
  options TCP_BLACKBOX
  to you kernel configuration file and recompile the kernel.
  You can remove the TCPPCAP option, if you want.

* Add
  net.inet.tcp.bb.log_auto_all=1
  net.inet.tcp.bb.log_auto_ratio=1
  net.inet.tcp.bb.log_auto_mode=1
  net.inet.tcp.bb.log_session_limit=200

This should enable BlackBox Logging on all TCP connections and it will the last 200
events per TCP connection stored in the end point. Once the system has crashed, please
provide a core. This time we should not only get the packet being sent/received, but
also internal TCP state variables and information about the retransmission timer.
If this information is not enough, we can adhoc extend the TCP Black Box Logging.

Comment 79 Chad Smith 2022-07-11 14:11:06 UTC

Hello all,

On the few servers we're running 13.1 we see these trap 12 errors too. The most interesting bit I can contribute is we only see it on servers utilizing if_bridge. 

My guess is the servers using if_bridge are seeing a lot more traffic across the wire and are therefore more likely to experience the packet sequence required to cause the trap 12 errors. When we rank servers by trap 12 error occurrence we see a correlation with bandwidth utilization (more bandwidth utilization, greater trap 12 error frequency).

Happy to help in any way I can.

Comment 80 Richard Scheffenegger freebsd_committer

2022-07-11 14:57:05 UTC

(In reply to Chad Smith from comment #79)

Do the affected servers report IP-layer output errors? Or low mbuf situations?

(netstat -s after a bit of uptime may show increasing IP output error counters).

Background: As the full packet trace (without timing) by itself is not sufficient to reproduce the issue, one possibility is that very infrequent error handling is involved in the introduction of the problematic state - and this may only happen on quite busy machines where mbuf allocation, or packet output issues are showing up.

Comment 81 Michael Tuexen freebsd_committer

2022-07-11 20:30:40 UTC

(In reply to Richard Scheffenegger from comment #80)
Yes, errors from ip_output() are logged:
https://cgit.freebsd.org/src/tree/sys/netinet/tcp_output.c?h=stable/13#n1670
This is not the same way as in RACK or BBR, but we can clean that up later.

Comment 82 Dmitriy 2022-07-18 10:36:34 UTC

(In reply to Michael Tuexen from comment #78)
Just sent the link with new core file in e-mail. Kernel was built with "options TCP_BLACKBOX", without any debug\diagnostic patches.
Thank you!

Comment 83 Michael Tuexen freebsd_committer

2022-07-18 10:56:14 UTC

(In reply to Dmitriy from comment #82)
Thank you very much for the core. Let me have a look, I'll report my findings.

Comment 84 Chad Smith 2022-07-20 17:51:22 UTC

(In reply to Richard Scheffenegger from comment #80)

Grabbed the netstat -s output from a recent crash file where the system had been up for an hour before it crashed again, also posting some additional info. 

We have a test bench set up and we are trying to reproduce this behavior on LAN. It seems in perfect network conditions a massively parallel iperf run across a bridged interface for 24 hours does not trigger this. Looking for ideas on how to simulate internet-like network conditions. Short of hooking the other end of our test bench up to an internet connection in another city I am out of ideas. Open to suggestions.

# uname -a
FreeBSD 13.1-RELEASE FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC amd64

Fatal trap 12: page fault while in kernel mode
cpuid = 4; apic id = 04
fault virtual address   = 0x10
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff8234a783
stack pointer           = 0x0:0xfffffe00c45a5a50
frame pointer           = 0x0:0xfffffe00c45a5a60
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (if_io_tqg_4)
trap number             = 12
panic: page fault
cpuid = 4
time = 1657212869
KDB: stack backtrace:
#0 0xffffffff80c69465 at kdb_backtrace+0x65
#1 0xffffffff80c1bb1f at vpanic+0x17f
#2 0xffffffff80c1b993 at panic+0x43
#3 0xffffffff810afdf5 at trap_fatal+0x385
#4 0xffffffff810afe4f at trap_pfault+0x4f
#5 0xffffffff81087528 at calltrap+0x8
#6 0xffffffff80d55ec7 at pfil_run_hooks+0x97
#7 0xffffffff8239af37 at bridge_pfil+0x497
#8 0xffffffff8239d5a3 at bridge_forward+0x323
#9 0xffffffff8239cef1 at bridge_input+0x4c1
#10 0xffffffff80d380fd at ether_nh_input+0x21d
#11 0xffffffff80d53089 at netisr_dispatch_src+0xb9
#12 0xffffffff80d372d9 at ether_input+0x69
#13 0xffffffff80d4f4d7 at iflib_rxeof+0xc27
#14 0xffffffff80d49b22 at _task_fn_rx+0x72
#15 0xffffffff80c67e9d at gtaskqueue_run_locked+0x15d
#16 0xffffffff80c67b12 at gtaskqueue_thread_loop+0xc2
#17 0xffffffff80bd8a5e at fork_exit+0x7e

------------------------------------------------------------------------
netstat -s

tcp:
        144916 packets sent
                10016 data packets (765551 bytes)
                0 data packets (0 bytes) retransmitted
                0 data packets unnecessarily retransmitted
                0 resends initiated by MTU discovery
                133061 ack-only packets (306 delayed)
                0 URG only packets
                0 window probe packets
                7 window update packets
                1832 control packets
        275476 packets received
                13046 acks (for 767206 bytes)
                17 duplicate acks
                0 UDP tunneled pkts
                0 UDP tunneled pkt cnt with errors
                0 acks for unsent data
                269667 packets (337208234 bytes) received in-sequence
                17 completely duplicate packets (1304 bytes)
                0 old duplicate packets
                0 packets with some dup. data (0 bytes duped)
                13 out-of-order packets (16952 bytes)
                0 packets (0 bytes) of data after window
                0 window probes
                0 window update packets
                0 packets received after close
                0 discarded for bad checksums
                0 discarded for bad header offset fields
                0 discarded because packet too short
                0 discarded due to full reassembly queue
        177 connection requests
        1561 connection accepts
        0 bad connection attempts
        0 listen queue overflows
        0 ignored RSTs in the windows
        1661 connections established (including accepts)
                6 times used RTT from hostcache
                6 times used RTT variance from hostcache
                0 times used slow-start threshold from hostcache
        1680 connections closed (including 0 drops)
                6 connections updated cached RTT on close
                6 connections updated cached RTT variance on close
                0 connections updated cached ssthresh on close
        3 embryonic connections dropped
        13046 segments updated rtt (of 9912 attempts)
        74 retransmit timeouts
                0 connections dropped by rexmit timeout
        0 persist timeouts
                0 connections dropped by persist timeout
        0 Connections (fin_wait_2) dropped because of timeout
        0 keepalive timeouts
                0 keepalive probes sent
                0 connections dropped by keepalive
        254 correct ACK header predictions
        260118 correct data packet header predictions
        1561 syncache entries added
                0 retransmitted
                0 dupsyn
                0 dropped
                1561 completed
                0 bucket overflow
                0 cache overflow
                0 reset
                0 stale
                0 aborted
                0 badack
                0 unreach
                0 zone failures
        1561 cookies sent
        0 cookies received
        1 hostcache entry added
                0 bucket overflow
        0 SACK recovery episodes
        0 segment rexmits in SACK recovery episodes
        0 byte rexmits in SACK recovery episodes
        0 SACK options (SACK blocks) received
        0 SACK options (SACK blocks) sent
        0 SACK scoreboard overflow
        0 packets with ECN CE bit set
        0 packets with ECN ECT(0) bit set
        0 packets with ECN ECT(1) bit set
        0 successful ECN handshakes
        0 times ECN reduced the congestion window
        0 packets with matching signature received
        0 packets with bad signature received
        0 times failed to make signature due to no SA
        0 times unexpected signature received
        0 times no signature provided by segment
        0 Path MTU discovery black hole detection activations
        0 Path MTU discovery black hole detection min MSS activations
        0 Path MTU discovery black hole detection failures
        154 times connection in TIME-WAIT responded with ACK
        0 times connection in TIME-WAIT was actively recycled
        0 times connection in TIME-WAIT responded with RST
TCP connection count by state:
        0 connections in CLOSED state
        8 connections in LISTEN state
        0 connections in SYN_SENT state
        0 connections in SYN_RCVD state
        6 connections in ESTABLISHED state
        0 connections in CLOSE_WAIT state
        0 connections in FIN_WAIT_1 state
        0 connections in CLOSING state
        0 connections in LAST_ACK state
        0 connections in FIN_WAIT_2 state
        64 connections in TIME_WAIT state
udp:
        288632 datagrams received
        0 with incomplete header
        0 with bad data length field
        0 with bad checksum
        0 with no checksum
        13740 dropped due to no socket
        2422 broadcast/multicast datagrams undelivered
        0 dropped due to full socket buffers
        0 not for hashed pcb
        272470 delivered
        179590 datagrams output
        0 times multicast source filter matched
ip:
        648231 total packets received
        0 bad header checksums
        0 with size smaller than minimum
        0 with data size < data length
        0 with ip length > max ip packet size
        0 with header length < data size
        0 with data length < header length
        0 with bad options
        0 with incorrect version number
        0 fragments received
        0 fragments dropped (dup or out of space)
        0 fragments dropped after timeout
        0 packets reassembled ok
        564854 packets for this host
        75426 packets for unknown/unsupported protocol
        0 packets forwarded (0 packets fast forwarded)
        7951 packets not forwardable
        0 packets received for unknown multicast group
        0 redirects sent
        324597 packets sent from this host
        0 packets sent with fabricated ip header
        0 output packets dropped due to no bufs, etc.
        2 output packets discarded due to no route
        0 output datagrams fragmented
        0 fragments created
        0 datagrams that can't be fragmented
        0 tunneling packets that can't find gif
        0 datagrams with bad address in header
icmp:
        75428 calls to icmp_error
        0 errors not generated in response to an icmp message
        Output histogram:
                echo reply: 80
                destination unreachable: 2
        0 messages with bad code fields
        0 messages less than the minimum length
        0 messages with bad checksum
        0 messages with bad length
        0 multicast echo requests ignored
        0 multicast timestamp requests ignored
        Input histogram:
                destination unreachable: 65
                echo: 80
        80 message responses generated
        0 invalid return addresses
        0 no return routes
arp:
        3 ARP requests sent
        1 ARP request failed to sent
        72 ARP replies sent
        20625 ARP requests received
        773 ARP replies received
        74733 ARP packets received
        0 total packets dropped due to no ARP entry
        0 ARP entrys timed out
        0 Duplicate IPs seen
ip6:
        0 total packets received
        0 with size smaller than minimum
        0 with data size < data length
        0 with bad options
        0 with incorrect version number
        0 fragments received
        0 fragments dropped (dup or out of space)
        0 fragments dropped after timeout
        0 fragments that exceeded limit
        0 atomic fragments
        0 packets reassembled ok
        0 packets for this host
        0 packets forwarded
        0 packets not forwardable
        0 redirects sent
        0 packets sent from this host
        0 packets sent with fabricated ip header
        0 output packets dropped due to no bufs, etc.
        0 output packets discarded due to no route
        0 output datagrams fragmented
        0 fragments created
        0 datagrams that can't be fragmented
        0 packets that violated scope rules
        0 multicast packets which we don't join
        Mbuf statistics:
                0 one mbuf
                0 one ext mbuf
                0 two or more ext mbuf
        0 packets whose headers are not contiguous
        0 tunneling packets that can't find gif
        0 packets discarded because of too many headers
        0 failures of source address selection
icmp6:
        0 calls to icmp6_error
        0 errors not generated in response to an icmp6 message
        0 errors not generated because of rate limitation
        0 messages with bad code fields
        0 messages < minimum length
        0 bad checksums
        0 messages with bad length
        Histogram of error messages to be generated:
                0 no route
                0 administratively prohibited
                0 beyond scope
                0 address unreachable
                0 port unreachable
                0 packet too big
                0 time exceed transit
                0 time exceed reassembly
                0 erroneous header field
                0 unrecognized next header
                0 unrecognized option
                0 redirect
                0 unknown
        0 message responses generated
        0 messages with too many ND options
        0 messages with bad ND options
        0 bad neighbor solicitation messages
        0 bad neighbor advertisement messages
        0 bad router solicitation messages
        0 bad router advertisement messages
        0 bad redirect messages
        0 default routers overflows
        0 prefix overflows
        0 neighbour entries overflows
        0 redirect overflows
        0 messages with invalid hop limit
        0 path MTU changes
rip6:
        0 messages received
        0 checksum calculations on inbound
        0 messages with bad checksum
        0 messages dropped due to no socket
        0 multicast messages dropped due to no socket
        0 messages dropped due to full socket buffers
        0 delivered
        0 datagrams output
pfkey:
        0 requests sent from userland
        0 bytes sent from userland
        0 messages with invalid length field
        0 messages with invalid version field
        0 messages with invalid message type field
        0 messages too short
        0 messages with memory allocation failure
        0 messages with duplicate extension
        0 messages with invalid extension type
        0 messages with invalid sa type
        0 messages with invalid address extension
        0 requests sent to userland
        0 bytes sent to userland
        0 messages toward single socket
        0 messages toward all sockets
        0 messages toward registered sockets
        0 messages with memory allocation failure

Comment 85 Richard Scheffenegger freebsd_committer

2022-07-20 18:45:01 UTC

Hi Chad,

Thank you very much for your effort. Are you sure about this panic - in the backtrace, it looks like an issue happend in 

#6 0xffffffff80d55ec7 at pfil_run_hooks+0x97
#7 0xffffffff8239af37 at bridge_pfil+0x497

(bridging code, not tcp).

In the few instances we have seen in more detail, it looked like TLS handshake was being performed when - for one reason or another - there was no response from the client for an extended period of time (probably 1-2 min), before seeing ACKs (and SACKs) again.

The IP stats in your case seem pretty clean - we suspected that a missing adjustment in a local error handling path may have something to do with this, but after doing some error injection prior and after addressing this, the behavior seems sufficiently different that I'm not convinced that is the culprit either.

Comment 86 Michael Tuexen freebsd_committer

2022-09-18 13:09:10 UTC

Created attachment 236664 [details]
Fix offset computation

Comment 87 Michael Tuexen freebsd_committer

2022-09-18 13:13:17 UTC

Hi Dmitriy,

there are several things which going unexpected, but one that I think is definitely wrong is the computation of an offset. Can you try the patch "Fix offset computation" and report if the kernel still crashes?

Comment 88 Dobri Dobrev 2022-09-18 13:14:12 UTC

Comment on attachment 236664 [details]
Fix offset computation

The patch doesn't match with tcp_output.c in 13-stable.

        } else {
            len = ((int32_t)ulmin(cwin,
                SEQ_SUB(p->end, p->rxmit)));
        }
        off = SEQ_SUB(p->rxmit, tp->snd_una);
        KASSERT(off >= 0,("%s: sack block to the left of una : %d",
            __func__, off));
        if (len > 0) {
            sack_rxmit = 1;
            sendalot = 1;
            TCPSTAT_INC(tcps_sack_rexmits);
            TCPSTAT_ADD(tcps_sack_rexmit_bytes,
                min(len, tcp_maxseg(tp)));
        }

Comment 89 Michael Tuexen freebsd_committer

2022-09-18 13:17:41 UTC

Created attachment 236665 [details]
Fix offset computation (stable/13)

Comment 90 Michael Tuexen freebsd_committer

2022-09-18 13:18:17 UTC

(In reply to Dobri Dobrev from comment #88)
Please use patch for stable/13

Comment 91 Michael Tuexen freebsd_committer

2022-09-18 13:51:37 UTC

Created attachment 236666 [details]
Updated: Fix offset computation (stable/13)

Comment 92 Michael Tuexen freebsd_committer

2022-09-18 13:53:22 UTC

Created attachment 236667 [details]
Updated: Fix offset computation (releng/13.1)

Comment 93 Michael Tuexen freebsd_committer

2022-09-18 13:59:45 UTC

I missed to move also an KASSERT. But it shouldn't be critical if you don't move it...

Comment 94 commit-hook freebsd_committer

2022-09-19 10:50:30 UTC

A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=6d9e911fbadf3b409802a211c1dae9b47cb5a2b8

commit 6d9e911fbadf3b409802a211c1dae9b47cb5a2b8
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2022-09-19 10:42:43 +0000
Commit:     Michael Tuexen <tuexen@FreeBSD.org>
CommitDate: 2022-09-19 10:49:31 +0000

    tcp: fix computation of offset

    Only update the offset if actually retransmitting from the
    scoreboard. If not done correctly, this may result in
    trying to (re)-transmit data not being being in the socket
    buffe and therefore resulting in a panic.

    PR:                     264257
    PR:                     263445
    PR:                     260393
    Reviewed by:            rscheff@
    MFC after:              3 days
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36626

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comment 95 Dobri Dobrev 2022-09-21 03:45:14 UTC

24 hours - no panic yet.
Looks promising.

Comment 96 Michael Tuexen freebsd_committer

2022-09-21 07:57:41 UTC

(In reply to Dobri Dobrev from comment #95)
How long did it take normally until a panic happened?

Comment 97 Dobri Dobrev 2022-09-22 07:44:35 UTC

(In reply to Michael Tuexen from comment #96)

Last time it took 10-15 minutes, however, I've had cases where it took 24+ hours.
Atm still running w/o a panic.

Comment 98 Marek Zarychta 2022-09-22 08:23:46 UTC

It looks like setting "net.inet.tcp.rfc6675_pipe=1" was a workaround, could you confirm, please?
This issue looks similar to the already fixed bug 254244.

Comment 99 Richard Scheffenegger freebsd_committer

2022-09-22 08:45:01 UTC

(In reply to Marek Zarychta from comment #98)
> It looks like setting "net.inet.tcp.rfc6675_pipe=1" was a workaround, could you confirm, please?

The workaround was "net.inet.tcp.rfc6675_pipe=0" for bug 254244, to disable "SACK rescue retransmissions".

> This issue looks similar to the already fixed bug 254244.

Similar yes, but not identical.

In Bug 254244, it was an oversight on my part, to deal with the final <FIN> bit in the sequence number space, if a rescue retransmission is to be done after the stack had already sent this <FIN>.

The implicit assumption is, that at the tail end of a TCP session, only a single <FIN>, at one specific sequence number exists - as it should be.

However, the complex of bug 260393, bug 263445 and this bug 264257 comes from an issue which existed "forever", but never, or extremely rarely manifested itself as a server-side panic. It does have some potential to lead to erraneous bytes at the tail end of a tcp session, which, depending on the application using the data, may lead to client side data inconsistencies.

That issue is, that under specific circumstances, a TCP session will sent multiple <FIN> at advancing sequence numbers, and "overwriting" the previously sent <FIN> with a (not fully) arbitrary data byte.

In once instance, logging showed up to 6 consecutive <FIN>s, each at a new position - but even just two such <FIN> at consecutive positions are problematic. 

This day-zero issue (most likely affecting / having affected all BSD derived TCP stacks with active SACK) is fixed in FreeBSD with 
https://reviews.freebsd.org/D36626

While investigating further, two more bugs showed up:
https://reviews.freebsd.org/D36631
https://reviews.freebsd.org/D36637

but these do not corrupt the tcpcb state, or lead to erraneous transmission of different information at the same sequence number - only affect when and which segments are sent. 

(A regression in MAIN was also found during looking closely at the test cases - one of which would not fully validate the expected state, but that is not related directly to this problem really).

Comment 100 commit-hook freebsd_committer

2022-09-22 10:17:50 UTC

A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=5ae83e0d871bc7cbe4dcc9a33d37eb689e631efe

commit 5ae83e0d871bc7cbe4dcc9a33d37eb689e631efe
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2022-09-22 10:12:11 +0000
Commit:     Michael Tuexen <tuexen@FreeBSD.org>
CommitDate: 2022-09-22 10:12:11 +0000

    tcp: send ACKs when requested

    When doing Limited Transmit send an ACK when needed by the protocol
    processing (like sending ACKs with a DSACK block).

    PR:                     264257
    PR:                     263445
    PR:                     260393
    Reviewed by:            rscheff@
    MFC after:              3 days
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36631

 sys/netinet/tcp_input.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comment 101 commit-hook freebsd_committer

2022-09-22 11:31:11 UTC

A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=a743fc8826fa348b09d219632594c537f8e5690e

commit a743fc8826fa348b09d219632594c537f8e5690e
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2022-09-22 10:55:25 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-09-22 11:28:43 +0000

    tcp: fix cwnd restricted SACK retransmission loop

    While doing the initial SACK retransmission segment while heavily cwnd
    constrained, tcp_ouput can erroneously send out the entire sendbuffer
    again. This may happen after an retransmission timeout, which resets
    snd_nxt to snd_una while the SACK scoreboard is still populated.

    Reviewed By:            tuexen, #transport
    PR:                     264257
    PR:                     263445
    PR:                     260393
    MFC after:              3 days
    Sponsored by:           NetApp, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36637

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comment 102 Marek Zarychta 2022-09-22 11:37:52 UTC

(In reply to Richard Scheffenegger from comment #99)
I asked because have never experienced such a panic on the systems I am maintaining, and since net.inet.tcp.rfc6675_pipe = 1 is set almost everywhere, so I was guessing that maybe it was a remedy.

Thank you for an exhaustive explanation of these intricacies.

Comment 103 Richard Scheffenegger freebsd_committer

2022-09-22 11:52:51 UTC

(In reply to Marek Zarychta from comment #102)

This setting got enabled by default in D28702 (https://reviews.freebsd.org/rGd1de2b05a001d3d80f633f576f4909c2686dda3d).

Prior to this, RFC6675 (SACK loss recovery) was implemented more fully with D18985 (https://reviews.freebsd.org/rG3c40e1d52cd86168779cf99dbabe58df465d7e3f).

This combination - enhanced features, combined with a very long lingering issue in handling rare corner cases of SACK loss recovery & retransmission timeout (RTO) interaction lead to the much higher incidence rate, and actual panics reported in these three associated bugs.

As a part of the sequence of events which lead to this issue includes non-responsive clients or network paths with extreme packet loss rates, this effect is unlikely to be observed in stable, low loss, reactive client environments.

Nevertheless, there is still a discussion point, if there should be a (binary) patch for this - or if providing a patch, and including this in STABLE-13 / STABLE-12 is sufficient. Most affected deployments could include the patches, so far.

Comment 104 commit-hook freebsd_committer

2022-09-25 10:05:41 UTC

A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=3651c4f42285644938e2f5bc924ab8c7ed857f83

commit 3651c4f42285644938e2f5bc924ab8c7ed857f83
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2022-09-22 10:55:25 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-09-25 08:52:56 +0000

    tcp: fix cwnd restricted SACK retransmission loop

    While doing the initial SACK retransmission segment while heavily cwnd
    constrained, tcp_ouput can erroneously send out the entire sendbuffer
    again. This may happen after an retransmission timeout, which resets
    snd_nxt to snd_una while the SACK scoreboard is still populated.

    Reviewed By:            tuexen, #transport
    PR:                     264257
    PR:                     263445
    PR:                     260393
    MFC after:              3 days
    Sponsored by:           NetApp, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36637

    (cherry picked from commit a743fc8826fa348b09d219632594c537f8e5690e)

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comment 105 commit-hook freebsd_committer

2022-09-25 10:05:46 UTC

A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9e69e009c86f259653610f3c337253b79381c7a7

commit 9e69e009c86f259653610f3c337253b79381c7a7
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2022-09-22 10:12:11 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-09-25 08:46:54 +0000

    tcp: send ACKs when requested

    When doing Limited Transmit send an ACK when needed by the protocol
    processing (like sending ACKs with a DSACK block).

    PR:                     264257
    PR:                     263445
    PR:                     260393
    Reviewed by:            rscheff@
    MFC after:              3 days
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36631

    (cherry picked from commit 5ae83e0d871bc7cbe4dcc9a33d37eb689e631efe)

 sys/netinet/tcp_input.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comment 106 commit-hook freebsd_committer

2022-09-25 10:05:52 UTC

A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=26370413d43bfd65500270ff331ae6bdf0f54133

commit 26370413d43bfd65500270ff331ae6bdf0f54133
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2022-09-19 10:42:43 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-09-25 08:41:54 +0000

    tcp: fix computation of offset

    Only update the offset if actually retransmitting from the
    scoreboard. If not done correctly, this may result in
    trying to (re)-transmit data not being being in the socket
    buffe and therefore resulting in a panic.

    PR:                     264257
    PR:                     263445
    PR:                     260393
    Reviewed by:            rscheff@
    MFC after:              3 days
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36626

    (cherry picked from commit 6d9e911fbadf3b409802a211c1dae9b47cb5a2b8)

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comment 107 commit-hook freebsd_committer

2022-09-25 10:05:57 UTC

A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=c1f9a81e7bfe354dfa4f191d5180426f76bc514b

commit c1f9a81e7bfe354dfa4f191d5180426f76bc514b
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2022-09-22 10:55:25 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-09-25 08:56:28 +0000

    tcp: fix cwnd restricted SACK retransmission loop

    While doing the initial SACK retransmission segment while heavily cwnd
    constrained, tcp_ouput can erroneously send out the entire sendbuffer
    again. This may happen after an retransmission timeout, which resets
    snd_nxt to snd_una while the SACK scoreboard is still populated.

    Reviewed By:            tuexen, #transport
    PR:                     264257
    PR:                     263445
    PR:                     260393
    MFC after:              3 days
    Sponsored by:           NetApp, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36637

    (cherry picked from commit a743fc8826fa348b09d219632594c537f8e5690e)

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comment 108 commit-hook freebsd_committer

2022-09-25 10:06:00 UTC

A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=f9edad0054652e020b8214f61c0e454fd48101a6

commit f9edad0054652e020b8214f61c0e454fd48101a6
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2022-09-22 10:12:11 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-09-25 08:55:41 +0000

    tcp: send ACKs when requested

    When doing Limited Transmit send an ACK when needed by the protocol
    processing (like sending ACKs with a DSACK block).

    PR:                     264257
    PR:                     263445
    PR:                     260393
    Reviewed by:            rscheff@
    MFC after:              3 days
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36631

    (cherry picked from commit 5ae83e0d871bc7cbe4dcc9a33d37eb689e631efe)

 sys/netinet/tcp_input.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comment 109 commit-hook freebsd_committer

2022-09-25 10:06:04 UTC

A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=0612d3000b974f31de15c90c77bf43f121fc8656

commit 0612d3000b974f31de15c90c77bf43f121fc8656
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2022-09-19 10:42:43 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-09-25 08:54:18 +0000

    tcp: fix computation of offset

    Only update the offset if actually retransmitting from the
    scoreboard. If not done correctly, this may result in
    trying to (re)-transmit data not being being in the socket
    buffe and therefore resulting in a panic.

    PR:                     264257
    PR:                     263445
    PR:                     260393
    Reviewed by:            rscheff@
    MFC after:              3 days
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36626

    (cherry picked from commit 6d9e911fbadf3b409802a211c1dae9b47cb5a2b8)

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comment 110 Dmitriy 2022-09-30 13:41:12 UTC

(In reply to Michael Tuexen from comment #87)
Hello,
after applying of the patch named "Fix offset computation" system is currently  12 days uptime with the same workload that before patch causing the crash after 5-8 days uptime. So, seems that was a root cause and it was fixed!
Thank you!

Comment 111 Christos Chatzaras 2022-09-30 13:45:46 UTC

Hello,

Is it possible this patch fixes the issue in comment #36 ?

Comment 112 Richard Scheffenegger freebsd_committer

2022-09-30 15:21:42 UTC

(In reply to Christos Chatzaras from comment #111)
Yes, this is very likely - the "fix offset" patch addresses the problematic sequence of events, leading to this off-by-one calculation. 

MAIN: https://reviews.freebsd.org/rG6d9e911fbadf3b409802a211c1dae9b47cb5a2b8
STABLE-13: https://reviews.freebsd.org/rG0612d3000b974f31de15c90c77bf43f121fc8656
STABLE-12: https://reviews.freebsd.org/rG26370413d43bfd65500270ff331ae6bdf0f54133

The change is effectively identical, in all variants.

(Without the other two fixes, the exact timing of packets being sent may not fully conform to TCP specs, but this doesn't materially impact stability or consistency at all).

Comment 113 Michael Tuexen freebsd_committer

2022-10-12 06:56:34 UTC

Closing this as it seems to be fixed. If the problem still exists, please reopen.

Comment 114 Timothy Pearson 2023-03-26 01:34:16 UTC

We're still seeing a very similar panic on a FreeBSD 13 router system that includes this patch in the kernel:

Tracing command kernel pid 0 tid 100012 td 0xfffffe001ed50720 (CPU 1)
kdb_enter() at kdb_enter+0x37/frame 0xfffffe001bd3f5b0
vpanic() at vpanic+0x1b0/frame 0xfffffe001bd3f600
panic() at panic+0x43/frame 0xfffffe001bd3f660
trap_fatal() at trap_fatal+0x385/frame 0xfffffe001bd3f6c0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe001bd3f720
calltrap() at calltrap+0x8/frame 0xfffffe001bd3f720
--- trap 0xc, rip = 0xffffffff80d3d250, rsp = 0xfffffe001bd3f7f0, rbp = 0xfffffe001bd3f850 ---
m_copym() at m_copym+0x30/frame 0xfffffe001bd3f850
ip_fragment() at ip_fragment+0x24f/frame 0xfffffe001bd3f8f0
ip_output() at ip_output+0x13d5/frame 0xfffffe001bd3fa30
ip_forward() at ip_forward+0x3cf/frame 0xfffffe001bd3fae0
ip_input() at ip_input+0x79e/frame 0xfffffe001bd3fb70
netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe001bd3fbc0
ether_demux() at ether_demux+0x138/frame 0xfffffe001bd3fbf0
ether_nh_input() at ether_nh_input+0x355/frame 0xfffffe001bd3fc50
netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe001bd3fca0
ether_input() at ether_input+0x69/frame 0xfffffe001bd3fd00
iflib_rxeof() at iflib_rxeof+0xc27/frame 0xfffffe001bd3fe00
_task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe001bd3fe40
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x15d/frame 0xfffffe001bd3fec0
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc2/frame 0xfffffe001bd3fef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe001bd3ff30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe001bd3ff30
--- trap 0, rip = 0xffffffff80c313af, rsp = 0, rbp = 0x3 ---
mi_startup() at mi_startup+0xdf/frame 0x3

Will try the workaround to see if the issue still appears.  Note it can be several weeks between panics, so we won't know if the workaround is still a solution until the end of April.

Comment 115 Richard Scheffenegger freebsd_committer

2023-03-26 03:40:03 UTC

The stacktrace here looks entirely different, with the local system's tcp stack not being involved (as this is a router, forwarded traffic doesn't normally get processed by the tcp stack - only by the ip layer as is visible in the stack trace).

I suggest to open a separate bug for this, and also provide the core dumps if possible (note that the core dumps may contain sensible information in the mbuf chains).