Bug 263445 - [tcp] Fatal trap 12: page fault while in kernel mode // supervisor read data, page not present // 13.1-RC3
Summary: [tcp] Fatal trap 12: page fault while in kernel mode // supervisor read data,...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.1-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-net (Nobody)
URL:
Keywords:
: 264534 (view as bug list)
Depends on:
Blocks:
 
Reported: 2022-04-21 11:12 UTC by Igor Valkov
Modified: 2022-10-12 06:48 UTC (History)
11 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Igor Valkov 2022-04-21 11:12:08 UTC
# kgdb -c vmcore.0 /boot/kernel/kernel   
GNU gdb (GDB) 11.2 [GDB v11.2 for FreeBSD]
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd13.1".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 4; apic id = 24
fault virtual address   = 0x18
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff806bb6dd
stack pointer           = 0x28:0xfffffe0295a174b0
frame pointer           = 0x28:0xfffffe0295a17520
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (if_io_tqg_4)
trap number             = 12
panic: page fault
cpuid = 4
time = 1650132610
KDB: stack backtrace:
#0 0xffffffff80676825 at kdb_backtrace+0x65
#1 0xffffffff806292df at vpanic+0x17f
#2 0xffffffff80629153 at panic+0x43
#3 0xffffffff809758e5 at trap_fatal+0x385
#4 0xffffffff8097593f at trap_pfault+0x4f
#5 0xffffffff8094ea68 at calltrap+0x8
#6 0xffffffff807a5ad9 at tcp_output+0x1339
#7 0xffffffff8079d1fd at tcp_do_segment+0x2cfd
#8 0xffffffff807997c1 at tcp_input_with_port+0xb61
#9 0xffffffff8079a46b at tcp_input+0xb
#10 0xffffffff8078bc2f at ip_input+0x11f
#11 0xffffffff8075f589 at netisr_dispatch_src+0xb9
#12 0xffffffff80744278 at ether_demux+0x138
#13 0xffffffff80745605 at ether_nh_input+0x355
#14 0xffffffff8075f589 at netisr_dispatch_src+0xb9
#15 0xffffffff807446a9 at ether_input+0x69
#16 0xffffffff80744261 at ether_demux+0x121
#17 0xffffffff80745605 at ether_nh_input+0x355
Uptime: 18h21m19s
Dumping 10524 out of 229348 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
Comment 1 Igor Valkov 2022-04-21 11:12:48 UTC
(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xffffffff80628edc in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487
#3  0xffffffff8062934e in vpanic (fmt=0xffffffff809f0adc "%s", ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:920
#4  0xffffffff80629153 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:844
#5  0xffffffff809758e5 in trap_fatal (frame=0xfffffe0295a173f0, eva=24) at /usr/src/sys/amd64/amd64/trap.c:944
#6  0xffffffff8097593f in trap_pfault (frame=0xfffffe0295a173f0, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:763
#7  <signal handler called>
#8  m_copydata (m=0x0, m@entry=0xfffff80905c79d00, off=0, len=1, cp=<optimized out>) at /usr/src/sys/kern/uipc_mbuf.c:659
#9  0xffffffff807a5ad9 in tcp_output (tp=<optimized out>) at /usr/src/sys/netinet/tcp_output.c:1081
#10 0xffffffff8079d1fd in tcp_do_segment (m=<optimized out>, th=<optimized out>, so=<optimized out>, tp=0xfffffe044ba66438, drop_hdrlen=41, tlen=<optimized out>, iptos=0 '\000')
    at /usr/src/sys/netinet/tcp_input.c:2637
#11 0xffffffff807997c1 in tcp_input_with_port (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>, port=port@entry=0) at /usr/src/sys/netinet/tcp_input.c:1400
#12 0xffffffff8079a46b in tcp_input (mp=0xfffff80905c79d00, offp=0x0, proto=1) at /usr/src/sys/netinet/tcp_input.c:1496
#13 0xffffffff8078bc2f in ip_input (m=0x0) at /usr/src/sys/netinet/ip_input.c:839
#14 0xffffffff8075f589 in netisr_dispatch_src (proto=1, source=source@entry=0, m=0xfffff801b4c0bd00) at /usr/src/sys/net/netisr.c:1143
#15 0xffffffff8075f95f in netisr_dispatch (proto=96967936, m=0x1) at /usr/src/sys/net/netisr.c:1234
#16 0xffffffff80744278 in ether_demux (ifp=ifp@entry=0xfffff81828629000, m=0x0) at /usr/src/sys/net/if_ethersubr.c:921
#17 0xffffffff80745605 in ether_input_internal (ifp=0xfffff81828629000, m=0x0) at /usr/src/sys/net/if_ethersubr.c:707
#18 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:737
#19 0xffffffff8075f589 in netisr_dispatch_src (proto=proto@entry=5, source=source@entry=0, m=m@entry=0xfffff801b4c0bd00) at /usr/src/sys/net/netisr.c:1143
#20 0xffffffff8075f95f in netisr_dispatch (proto=96967936, proto@entry=5, m=0x1, m@entry=0xfffff801b4c0bd00) at /usr/src/sys/net/netisr.c:1234
#21 0xffffffff807446a9 in ether_input (ifp=<optimized out>, m=0xfffff801b4c0bd00) at /usr/src/sys/net/if_ethersubr.c:828
#22 0xffffffff80744261 in ether_demux (ifp=ifp@entry=0xfffff80102df8800, m=0x0) at /usr/src/sys/net/if_ethersubr.c:874
#23 0xffffffff80745605 in ether_input_internal (ifp=0xfffff80102df8800, m=0x0) at /usr/src/sys/net/if_ethersubr.c:707
#24 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:737
#25 0xffffffff8075f589 in netisr_dispatch_src (proto=proto@entry=5, source=source@entry=0, m=m@entry=0xfffff801b4c0bd00) at /usr/src/sys/net/netisr.c:1143
#26 0xffffffff8075f95f in netisr_dispatch (proto=96967936, proto@entry=5, m=0x1, m@entry=0xfffff801b4c0bd00) at /usr/src/sys/net/netisr.c:1234
#27 0xffffffff807446a9 in ether_input (ifp=<optimized out>, m=0xfffff801b4c0bd00) at /usr/src/sys/net/if_ethersubr.c:828
#28 0xffffffff807a2d04 in tcp_lro_flush (lc=lc@entry=0xfffffe019e015d30, le=0xfffffe019edb3690) at /usr/src/sys/netinet/tcp_lro.c:1375
#29 0xffffffff807a304b in tcp_lro_rx_done (lc=0xfffffe019e015d30) at /usr/src/sys/netinet/tcp_lro.c:566
#30 tcp_lro_flush_all (lc=lc@entry=0xfffffe019e015d30) at /usr/src/sys/netinet/tcp_lro.c:1532
#31 0xffffffff8075ba03 in iflib_rxeof (rxq=<optimized out>, rxq@entry=0xfffffe019e015d00, budget=<optimized out>) at /usr/src/sys/net/iflib.c:3058
#32 0xffffffff80756022 in _task_fn_rx (context=0xfffffe019e015d00) at /usr/src/sys/net/iflib.c:3990
#33 0xffffffff8067525d in gtaskqueue_run_locked (queue=queue@entry=0xfffff801029b8200) at /usr/src/sys/kern/subr_gtaskqueue.c:371
#34 0xffffffff80674ed2 in gtaskqueue_thread_loop (arg=<optimized out>, arg@entry=0xfffffe019df0e068) at /usr/src/sys/kern/subr_gtaskqueue.c:547
#35 0xffffffff805e621e in fork_exit (callout=0xffffffff80674e10 <gtaskqueue_thread_loop>, arg=0xfffffe019df0e068, frame=0xfffffe0295a17f40) at /usr/src/sys/kern/kern_fork.c:1093
#36 <signal handler called>
#37 mi_startup () at /usr/src/sys/kern/init_main.c:322
Backtrace stopped: Cannot access memory at address 0x14
Comment 2 Marek Zarychta 2022-04-21 15:36:06 UTC
Looks similar to PR260393
Comment 3 Richard Scheffenegger freebsd_committer freebsd_triage 2022-04-21 15:46:20 UTC
Igor, 

Can you provide the core dump of this panic, or reproduce the issue?

We'd need a better understanding of the circumstances leading to this panic, having a full core would certainly help.
Comment 5 Igor Valkov 2022-04-21 18:39:58 UTC
(In reply to Richard Scheffenegger from comment #3)
https://cloud.mediatoday.ru/d/0368a517063047758a0b/
Comment 6 Igor Valkov 2022-05-31 09:45:09 UTC
13.1-RELEASE - the same.

last crashes

-rw-------  1 root  wheel   9841647616 May 15 16:51 vmcore.0
-rw-------  1 root  wheel  12582248448 May 19 10:31 vmcore.1
-rw-------  1 root  wheel  10298535936 May 20 01:04 vmcore.2
-rw-------  1 root  wheel  11421458432 May 20 15:20 vmcore.3
-rw-------  1 root  wheel  12387786752 May 30 20:17 vmcore.4
-rw-------  1 root  wheel   9590677504 May 30 23:12 vmcore.5
Comment 7 Igor Valkov 2022-05-31 09:45:56 UTC
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 21
fault virtual address   = 0x18
fault code      = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff806bb69d
stack pointer           = 0x28:0xfffffe0295a084b0
frame pointer           = 0x28:0xfffffe0295a08520
code segment        = base rx0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process     = 0 (if_io_tqg_1)
trap number     = 12
panic: page fault
cpuid = 1
time = 1653941306
KDB: stack backtrace:
#0 0xffffffff806767e5 at kdb_backtrace+0x65
#1 0xffffffff806292df at vpanic+0x17f
#2 0xffffffff80629153 at panic+0x43
#3 0xffffffff80975895 at trap_fatal+0x385
#4 0xffffffff809758ef at trap_pfault+0x4f
#5 0xffffffff8094ea18 at calltrap+0x8
#6 0xffffffff807a5a89 at tcp_output+0x1339
#7 0xffffffff8079cdda at tcp_do_segment+0x292a
#8 0xffffffff80799771 at tcp_input_with_port+0xb61
#9 0xffffffff8079a41b at tcp_input+0xb
#10 0xffffffff8078bbdf at ip_input+0x11f
#11 0xffffffff8075f539 at netisr_dispatch_src+0xb9
#12 0xffffffff80744228 at ether_demux+0x138
#13 0xffffffff807455b5 at ether_nh_input+0x355
#14 0xffffffff8075f539 at netisr_dispatch_src+0xb9
#15 0xffffffff80744659 at ether_input+0x69
#16 0xffffffff80744211 at ether_demux+0x121
#17 0xffffffff807455b5 at ether_nh_input+0x355
Comment 8 Richard Scheffenegger freebsd_committer freebsd_triage 2022-05-31 11:26:15 UTC
Hi Victor,

sorry for the delay. Can you verify, that in all your cores the tcp t_state (p *tp in the first frame of tcp_do_segment) is TCPS_LAST_ACK (8)?

And that you have some unacknowledged SACK information, where one byte is outstanding?

p *tp->sackhint.nexthole

f 10
p tp->t_state
p *tp->sackhint.nexthole
$4 = {start = 3327712881, end = 3327714341, rxmit = 3327714340, scblink = {tqe_next = 0x0, tqe_prev = 0xfffffe044ba66578}}

It seems as if there is a little bit of data being sent, and before all of the data is fully acked by the client, the application closes the socket, but only receives a SACK for the FIN (?), while the two prior data packets are still outstanding - and probably an off-by-one error happens during SACK processing...

As a stopgap measure, you can disable SACK (net.inet.tcp.sack.enable=0), or disable PRR - this is one new SACK-related feature, but should only affect timing (when to send, NOT what to send) with net.inet.tcp.do_prr=0.
Comment 9 Michael Tuexen freebsd_committer freebsd_triage 2022-06-02 11:59:00 UTC
(In reply to Richard Scheffenegger from comment #8)
Hi Richard,

I'm looking at the tracefile provided by Igor and see:

(kgdb) f 10
#10 0xffffffff80dd7eed in tcp_do_segment (m=<optimized out>, th=<optimized out>, so=<optimized out>, tp=0xfffffe025fb86518, drop_hdrlen=52,
    tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2637
2637						(void) tp->t_fb->tfb_tcp_output(tp);
(kgdb) p tp->t_state
$10 = 6
(kgdb) p *tp->sackhint.nexthole
$11 = {start = 1529400226, end = 1529409856, rxmit = 1529409855, scblink = {tqe_next = 0x0, tqe_prev = 0xfffffe025fb86658}}
(kgdb)

Do you really see 8 as the state (which is TCPS_LAST_ACK)? I see 6 (which is TCPS_FIN_WAIT_1).
Comment 10 Michael Tuexen freebsd_committer freebsd_triage 2022-06-02 16:04:01 UTC
(In reply to Michael Tuexen from comment #9)
The confusion was on my part by using a 13.1 RELEASE kernel instead of a 13.1R3 kernel.
Comment 11 Michael Tuexen freebsd_committer freebsd_triage 2022-06-02 16:06:22 UTC
(In reply to Igor A. Valkov from comment #6)
Hi Igor,
we discussed this bug on todays transport call. Two questions:
1. Can you make also core files against 13.1 RELEASE available?
2. Would you be willing to and able to run a custom kernel? We might want to provide a kernel which allows to get more data on the situation under which the problem occurs.
Comment 12 Igor Valkov 2022-06-04 22:32:22 UTC
(In reply to Michael Tuexen from comment #11)
1. Сoredumps 13.1-RELEASE with debuginfo are available here: https://cloud.mediatoday.ru/d/0368a517063047758a0b/

2. Custom kernel is a 13.1-RELEASE GENERIC?
Comment 13 Michael Tuexen freebsd_committer freebsd_triage 2022-06-05 09:08:10 UTC
(In reply to Igor A. Valkov from comment #12)
Thanks for providing the core files. We want to see if there is a pattern.

Regarding "custom build kernel": We were wondering if you can compile a kernel with specific options turned on (like INVARIANTS and other things) and potentially have some modifications to the source code? This intention is to get more information about what is going on...

How long does a server need to run until it crashes? Are the servers running at high load?
Comment 14 Igor Valkov 2022-06-05 09:31:00 UTC
(In reply to Michael Tuexen from comment #13)

> How long does a server need to run until it crashes?
Sometimes several hours but sometimes several days.
Here are the timings of the incidents:

May 11 21:00
May 13 15:02
May 14 08:05
May 15 15:10
May 15 16:51
May 19 10:31 vmcore.1
May 20 01:04 vmcore.2
May 20 15:20 vmcore.3
May 30 20:17 vmcore.4
May 30 23:12 vmcore.5


> Are the servers running at high load?
Yes. Web serving on 10Gb channel (Intel X550T) with nginx. 64 cores of opteron 6380.

> Regarding "custom build kernel": We were wondering if you can compile a kernel with specific options turned on (like INVARIANTS and other things) and potentially have some modifications to the source code? This intention is to get more information about what is going on...

Yes. I can.
Comment 15 Igor Valkov 2022-06-05 09:38:33 UTC
now

% uptime
12:33  up 5 days, 13:22, 1 user, load averages: 18,27 20,72 20,07

% zpool status
  pool: vol
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        vol         ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ada0p3  ONLINE       0     0     0
            ada1p3  ONLINE       0     0     0

% cat /etc/sysctl.conf 
vfs.zfs.min_auto_ashift=12
vfs.zfs.arc_min=8589934592
vfs.zfs.arc_max=68719476736
vfs.zfs.txg.timeout=30

kern.maxdsiz=274877906944
kern.dfldsiz=274877906944
kern.maxtsiz=274877906944

net.inet.tcp.fastopen.server_enable=0
net.inet.tcp.fastopen.client_enable=0

kern.ipc.shm_use_phys=1
kern.ipc.maxsockbuf=157286400
kern.ipc.soacceptqueue=16384

kern.ipc.tls.enable=1
kern.ipc.tls.cbc_enable=1

net.route.netisr_maxqlen=2048
net.inet.ip.intr_queue_maxlen=2048

#net.inet.tcp.functions_default=bbr
#net.inet.tcp.functions_inherit_listen_socket_stack=0

net.inet.tcp.rfc6675_pipe=1
net.inet.tcp.mssdflt=1460
net.inet.tcp.minmss=536
net.inet.tcp.abc_l_var=44
net.inet.tcp.initcwnd_segments=44

net.inet.tcp.recvbuf_max=4194304
net.inet.tcp.recvspace=1048576

net.inet.tcp.sendbuf_inc=65536
net.inet.tcp.sendbuf_max=4194304
net.inet.tcp.sendspace=1048576

net.inet.tcp.finwait2_timeout=15000

kern.corefile=/export/coredumps/%N.core
Comment 16 Michael Tuexen freebsd_committer freebsd_triage 2022-06-06 16:06:59 UTC
Can you try if the problem also occurs if you disable KTLS?
Comment 17 Richard Scheffenegger freebsd_committer freebsd_triage 2022-06-07 07:01:52 UTC
While we don't yet understand how the TCPCB ends up in the peculiar state it is in when the panic happens, there appearts to be a symptomatic treatment by ignoring invalid SACK scoreboard state which however is in the approximate correct sequence space (thus no fully random memory contents).

See https://reviews.freebsd.org/D35387
Comment 18 Igor Valkov 2022-06-07 07:12:56 UTC
(In reply to Michael Tuexen from comment #16)
Ok. Now rebuilding kernel without options KERN_TLS.

In the previous configuration, uptime is:
10:11  up 7 days, 11 hrs, 2 users, load averages: 17,39 16,85 17,08
:-) crashes are very random.
Comment 19 commit-hook freebsd_committer freebsd_triage 2022-06-07 07:41:02 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=91d6afe6e2a912fd5059fc11dbeffc85474897af

commit 91d6afe6e2a912fd5059fc11dbeffc85474897af
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2022-06-07 07:07:09 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-06-07 07:38:16 +0000

    tcp: Sanity check of SACK holes on retransmissions

    Adding a few KASSERT() to validate sanity of sack holes, and
    bail out if sack hole is inconsistent to avoid panicing non-invariant builds.

    Reviewed By:    hselasky, glebius
    PR:             263445
    MFC after:      1 week
    Sponsored by:   NetApp, Inc.
    Differential Revision:  https://reviews.freebsd.org/D35387

 sys/netinet/tcp_sack.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)
Comment 20 commit-hook freebsd_committer freebsd_triage 2022-06-07 18:47:01 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=231e0dd5d1fb7778b1cb285e5ebee5502d5ad253

commit 231e0dd5d1fb7778b1cb285e5ebee5502d5ad253
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2022-06-07 16:16:54 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-06-07 16:18:42 +0000

    tcp: skip sackhole checks on NULL

    Inadvertedly introduced NULL pointer dereference during
    sackhole sanity check in D35387.

    Reviewed By:    glebius
    PR:             263445
    MFC after:      1 week
    Sponsored by:   NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D35423

 sys/netinet/tcp_sack.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
Comment 21 commit-hook freebsd_committer freebsd_triage 2022-06-08 07:40:10 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=ce2525c8108a830d08d75771621d1bc580edd82c

commit ce2525c8108a830d08d75771621d1bc580edd82c
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2022-06-08 07:14:16 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-06-08 07:18:32 +0000

    tcp: remove goto and address another NULL deref in SACK

    Missed another NULL dereference during KASSERTS after traversing
    the scoreboard. While at it, scratch the goto by making the
    traversal conditional, and remove duplicate checks using an
    unconditional loop with all checks inside.

    Reviewed By:    hselasky
    PR:             263445
    MFC after:      1 week
    Sponsored by:   NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D35428

 sys/netinet/tcp_sack.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)
Comment 22 Marek Zarychta 2022-06-08 09:27:56 UTC
*** Bug 264534 has been marked as a duplicate of this bug. ***
Comment 23 commit-hook freebsd_committer freebsd_triage 2022-06-08 12:54:07 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=57317c8971df76bd6faeb7dfdc4379097d004caf

commit 57317c8971df76bd6faeb7dfdc4379097d004caf
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2022-06-08 12:21:28 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-06-08 12:51:31 +0000

    tcp: exclude KASSERTS when rescue retransmissions are in play.

    The KASSERT criteria needs to be checked against the
    sendbuffer so_snd in a subsequent version.

    Reviewed By:    tuexen, #transport
    PR:             263445
    MFC after:      1 week
    Sponsored by:   NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D35431

 sys/netinet/tcp_sack.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)
Comment 24 Richard Scheffenegger freebsd_committer freebsd_triage 2022-06-10 22:18:37 UTC
The current thinking is, that SACK rescue retransmissions (in FBSD13 this is gated by net.inet.tcp.rfc6675_pipe=1) very rarely creates an entry, which apparently is beyond the valid data range. 

While under most common circumstances, a final FIN bit in the sequence space is taken care of, it seems that there may be some double-counting for the FIN bit.

In most of the inspected cores, we found:

TCP state: LAST_ACK (FIN received and also FIN sent)
SACK loss recovery triggered
A cumulative ACK before all outstanding data was received
The remote cliet "disappears" for a significant amount of time (7 to 12 retransmission timeouts), but may re-appear again just prior.
snd_max consistently 2 counts above the last data, instead of the expected 1 (for the FIN bit).

However, it is still unclear under what circumstances this double-counting happens, possibly when the persist timer triggers, and a few other conditions are also fulfilled - maybe a race condition between normal packet processing and a timer firing.

In short: disabling rfc6675 enhanced SACK features (more correct pipeline accounting, rescue retransmissions) should address the cause of the panic, while not addressing the root cause of when/why there is the double-accounting of the FIN bit...

Would you be willing to run an intrumented kernel, which either panics (full core dump), or spews out various state, when inconsistencies are detected in this space - while ignoring/addressing them "on the fly" without panicing?
Comment 25 commit-hook freebsd_committer freebsd_triage 2022-09-19 10:50:27 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=6d9e911fbadf3b409802a211c1dae9b47cb5a2b8

commit 6d9e911fbadf3b409802a211c1dae9b47cb5a2b8
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2022-09-19 10:42:43 +0000
Commit:     Michael Tuexen <tuexen@FreeBSD.org>
CommitDate: 2022-09-19 10:49:31 +0000

    tcp: fix computation of offset

    Only update the offset if actually retransmitting from the
    scoreboard. If not done correctly, this may result in
    trying to (re)-transmit data not being being in the socket
    buffe and therefore resulting in a panic.

    PR:                     264257
    PR:                     263445
    PR:                     260393
    Reviewed by:            rscheff@
    MFC after:              3 days
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36626

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 26 commit-hook freebsd_committer freebsd_triage 2022-09-22 10:17:53 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=5ae83e0d871bc7cbe4dcc9a33d37eb689e631efe

commit 5ae83e0d871bc7cbe4dcc9a33d37eb689e631efe
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2022-09-22 10:12:11 +0000
Commit:     Michael Tuexen <tuexen@FreeBSD.org>
CommitDate: 2022-09-22 10:12:11 +0000

    tcp: send ACKs when requested

    When doing Limited Transmit send an ACK when needed by the protocol
    processing (like sending ACKs with a DSACK block).

    PR:                     264257
    PR:                     263445
    PR:                     260393
    Reviewed by:            rscheff@
    MFC after:              3 days
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36631

 sys/netinet/tcp_input.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 27 commit-hook freebsd_committer freebsd_triage 2022-09-22 11:31:07 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=a743fc8826fa348b09d219632594c537f8e5690e

commit a743fc8826fa348b09d219632594c537f8e5690e
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2022-09-22 10:55:25 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-09-22 11:28:43 +0000

    tcp: fix cwnd restricted SACK retransmission loop

    While doing the initial SACK retransmission segment while heavily cwnd
    constrained, tcp_ouput can erroneously send out the entire sendbuffer
    again. This may happen after an retransmission timeout, which resets
    snd_nxt to snd_una while the SACK scoreboard is still populated.

    Reviewed By:            tuexen, #transport
    PR:                     264257
    PR:                     263445
    PR:                     260393
    MFC after:              3 days
    Sponsored by:           NetApp, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36637

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 28 commit-hook freebsd_committer freebsd_triage 2022-09-25 10:05:48 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9e69e009c86f259653610f3c337253b79381c7a7

commit 9e69e009c86f259653610f3c337253b79381c7a7
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2022-09-22 10:12:11 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-09-25 08:46:54 +0000

    tcp: send ACKs when requested

    When doing Limited Transmit send an ACK when needed by the protocol
    processing (like sending ACKs with a DSACK block).

    PR:                     264257
    PR:                     263445
    PR:                     260393
    Reviewed by:            rscheff@
    MFC after:              3 days
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36631

    (cherry picked from commit 5ae83e0d871bc7cbe4dcc9a33d37eb689e631efe)

 sys/netinet/tcp_input.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 29 commit-hook freebsd_committer freebsd_triage 2022-09-25 10:05:51 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=26370413d43bfd65500270ff331ae6bdf0f54133

commit 26370413d43bfd65500270ff331ae6bdf0f54133
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2022-09-19 10:42:43 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-09-25 08:41:54 +0000

    tcp: fix computation of offset

    Only update the offset if actually retransmitting from the
    scoreboard. If not done correctly, this may result in
    trying to (re)-transmit data not being being in the socket
    buffe and therefore resulting in a panic.

    PR:                     264257
    PR:                     263445
    PR:                     260393
    Reviewed by:            rscheff@
    MFC after:              3 days
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36626

    (cherry picked from commit 6d9e911fbadf3b409802a211c1dae9b47cb5a2b8)

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 30 commit-hook freebsd_committer freebsd_triage 2022-09-25 10:05:55 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=3651c4f42285644938e2f5bc924ab8c7ed857f83

commit 3651c4f42285644938e2f5bc924ab8c7ed857f83
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2022-09-22 10:55:25 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-09-25 08:52:56 +0000

    tcp: fix cwnd restricted SACK retransmission loop

    While doing the initial SACK retransmission segment while heavily cwnd
    constrained, tcp_ouput can erroneously send out the entire sendbuffer
    again. This may happen after an retransmission timeout, which resets
    snd_nxt to snd_una while the SACK scoreboard is still populated.

    Reviewed By:            tuexen, #transport
    PR:                     264257
    PR:                     263445
    PR:                     260393
    MFC after:              3 days
    Sponsored by:           NetApp, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36637

    (cherry picked from commit a743fc8826fa348b09d219632594c537f8e5690e)

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 31 commit-hook freebsd_committer freebsd_triage 2022-09-25 10:05:59 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=c1f9a81e7bfe354dfa4f191d5180426f76bc514b

commit c1f9a81e7bfe354dfa4f191d5180426f76bc514b
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2022-09-22 10:55:25 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-09-25 08:56:28 +0000

    tcp: fix cwnd restricted SACK retransmission loop

    While doing the initial SACK retransmission segment while heavily cwnd
    constrained, tcp_ouput can erroneously send out the entire sendbuffer
    again. This may happen after an retransmission timeout, which resets
    snd_nxt to snd_una while the SACK scoreboard is still populated.

    Reviewed By:            tuexen, #transport
    PR:                     264257
    PR:                     263445
    PR:                     260393
    MFC after:              3 days
    Sponsored by:           NetApp, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36637

    (cherry picked from commit a743fc8826fa348b09d219632594c537f8e5690e)

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 32 commit-hook freebsd_committer freebsd_triage 2022-09-25 10:06:02 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=f9edad0054652e020b8214f61c0e454fd48101a6

commit f9edad0054652e020b8214f61c0e454fd48101a6
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2022-09-22 10:12:11 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-09-25 08:55:41 +0000

    tcp: send ACKs when requested

    When doing Limited Transmit send an ACK when needed by the protocol
    processing (like sending ACKs with a DSACK block).

    PR:                     264257
    PR:                     263445
    PR:                     260393
    Reviewed by:            rscheff@
    MFC after:              3 days
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36631

    (cherry picked from commit 5ae83e0d871bc7cbe4dcc9a33d37eb689e631efe)

 sys/netinet/tcp_input.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 33 commit-hook freebsd_committer freebsd_triage 2022-09-25 10:06:05 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=0612d3000b974f31de15c90c77bf43f121fc8656

commit 0612d3000b974f31de15c90c77bf43f121fc8656
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2022-09-19 10:42:43 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2022-09-25 08:54:18 +0000

    tcp: fix computation of offset

    Only update the offset if actually retransmitting from the
    scoreboard. If not done correctly, this may result in
    trying to (re)-transmit data not being being in the socket
    buffe and therefore resulting in a panic.

    PR:                     264257
    PR:                     263445
    PR:                     260393
    Reviewed by:            rscheff@
    MFC after:              3 days
    Sponsored by:           Netflix, Inc.
    Differential Revision:  https://reviews.freebsd.org/D36626

    (cherry picked from commit 6d9e911fbadf3b409802a211c1dae9b47cb5a2b8)

 sys/netinet/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 34 Richard Scheffenegger freebsd_committer freebsd_triage 2022-10-12 06:42:00 UTC
I believe we can close this bug, as we haven't had any reports of issues by those affected after updating/patching..
Comment 35 Michael Tuexen freebsd_committer freebsd_triage 2022-10-12 06:48:41 UTC
Closing this, as I think it is fixed. Please reopen, if the problem still exists.