Bug 254725 - [tcp] 13.0-RC4 crash tcp_lro
Summary: [tcp] 13.0-RC4 crash tcp_lro
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-net (Nobody)
URL:
Keywords: panic
Depends on:
Blocks:
 
Reported: 2021-04-02 16:37 UTC by Christos Chatzaras
Modified: 2021-04-05 09:03 UTC (History)
7 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christos Chatzaras 2021-04-02 16:37:18 UTC
I run 13.0-RC4 for few days and I got this panic:

Fatal trap 12: page fault while in kernel mode
cpuid = 7; apic id = 07
fault virtual address   = 0x18
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80c9b7d8
stack pointer           = 0x0:0xfffffe00357a51c0
frame pointer           = 0x0:0xfffffe00357a5230
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (if_io_tqg_7)
trap number             = 12
panic: page fault
cpuid = 7
time = 1617377524
KDB: stack backtrace:
#0 0xffffffff80c57345 at kdb_backtrace+0x65
#1 0xffffffff80c09d21 at vpanic+0x181
#2 0xffffffff80c09b93 at panic+0x43
#3 0xffffffff8108a187 at trap_fatal+0x387
#4 0xffffffff8108a1df at trap_pfault+0x4f
#5 0xffffffff8108983d at trap+0x27d
#6 0xffffffff81061768 at calltrap+0x8
#7 0xffffffff80dc8a33 at tcp_output+0x10b3
#8 0xffffffff80dc0fcb at tcp_do_segment+0x301b
#9 0xffffffff80dbd1ee at tcp_input+0xabe
#10 0xffffffff80dafbe5 at ip_input+0x125
#11 0xffffffff80d3f2ca at netisr_dispatch_src+0xca
#12 0xffffffff80d23a58 at ether_demux+0x148
#13 0xffffffff80d24ddc at ether_nh_input+0x34c
#14 0xffffffff80d3f2ca at netisr_dispatch_src+0xca
#15 0xffffffff80d23ea9 at ether_input+0x69
#16 0xffffffff80dc6a61 at tcp_flush_out_le+0x221
#17 0xffffffff80dc67fd at tcp_lro_flush+0x2ad
Uptime: 2d15h58m1s
Dumping 2453 out of 32505 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=<optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xffffffff80c09916 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:486
#3  0xffffffff80c09d90 in vpanic (fmt=<optimized out>, ap=<optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:919
#4  0xffffffff80c09b93 in panic (fmt=<unavailable>)
    at /usr/src/sys/kern/kern_shutdown.c:843
#5  0xffffffff8108a187 in trap_fatal (frame=0xfffffe00357a5100, eva=24)
    at /usr/src/sys/amd64/amd64/trap.c:915
#6  0xffffffff8108a1df in trap_pfault (frame=frame@entry=0xfffffe00357a5100,
    usermode=false, signo=<optimized out>, signo@entry=0x0,
    ucode=<optimized out>, ucode@entry=0x0)
    at /usr/src/sys/amd64/amd64/trap.c:732
#7  0xffffffff8108983d in trap (frame=0xfffffe00357a5100)
    at /usr/src/sys/amd64/amd64/trap.c:398
#8  <signal handler called>
#9  m_copydata (m=m@entry=0x0, off=0, len=1, cp=<optimized out>)
    at /usr/src/sys/kern/uipc_mbuf.c:656
#10 0xffffffff80dc8a33 in tcp_output (tp=0xfffffe013eac04d8)
    at /usr/src/sys/netinet/tcp_output.c:1068
#11 0xffffffff80dc0fcb in tcp_do_segment (m=0xfffff804e393ca00,
    th=<optimized out>, so=<optimized out>, tp=0xfffffe013eac04d8,
    drop_hdrlen=64, tlen=<optimized out>, iptos=0 '\000')
    at /usr/src/sys/sys/libkern.h:91
#12 0xffffffff80dbd1ee in tcp_input (mp=<optimized out>,
    offp=<optimized out>, proto=<optimized out>)
    at /usr/src/sys/netinet/tcp_input.c:1382
#13 0xffffffff80dafbe5 in ip_input (m=0x0)
    at /usr/src/sys/netinet/ip_input.c:829
#14 0xffffffff80d3f2ca in netisr_dispatch_src (proto=1,
    source=<optimized out>, source@entry=0, m=0xfffff801e35a659c)
    at /usr/src/sys/net/netisr.c:1143
#15 0xffffffff80d3f5bf in netisr_dispatch (proto=0, m=0x1)
    at /usr/src/sys/net/netisr.c:1234
#16 0xffffffff80d23a58 in ether_demux (ifp=ifp@entry=0xfffff80004075000,
    m=0x0) at /usr/src/sys/net/if_ethersubr.c:923
#17 0xffffffff80d24ddc in ether_input_internal (ifp=0xfffff80004075000, m=0x0)
    at /usr/src/sys/net/if_ethersubr.c:709
#18 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:739
#19 0xffffffff80d3f2ca in netisr_dispatch_src (proto=proto@entry=5,
    source=<optimized out>, source@entry=0, m=0xfffff801e35a659c,
    m@entry=0xfffff804e393ca00) at /usr/src/sys/net/netisr.c:1143
#20 0xffffffff80d3f5bf in netisr_dispatch (proto=0, proto@entry=5, m=0x1,
    m@entry=0xfffff804e393ca00) at /usr/src/sys/net/netisr.c:1234
#21 0xffffffff80d23ea9 in ether_input (ifp=<optimized out>,
    m=0xfffff804e393ca00) at /usr/src/sys/net/if_ethersubr.c:830
#22 0xffffffff80dc6a61 in tcp_flush_out_le (tp=0x0,
    lc=lc@entry=0xfffff8000405f830, le=le@entry=0xfffffe0104118498, locked=0)
    at /usr/src/sys/netinet/tcp_lro.c:569
#23 0xffffffff80dc67fd in tcp_lro_flush (lc=lc@entry=0xfffff8000405f830,
    le=0xfffffe0104118498) at /usr/src/sys/netinet/tcp_lro.c:978
#24 0xffffffff80dc6bab in tcp_lro_rx_done (lc=0xfffff8000405f830)
    at /usr/src/sys/netinet/tcp_lro.c:356
#25 tcp_lro_flush_all (lc=lc@entry=0xfffff8000405f830)
    at /usr/src/sys/netinet/tcp_lro.c:1123
#26 0xffffffff80d3ba22 in iflib_rxeof (rxq=<optimized out>,
    rxq@entry=0xfffff8000405f800, budget=<optimized out>)
    at /usr/src/sys/net/iflib.c:3017
#27 0xffffffff80d35d32 in _task_fn_rx (context=0xfffff8000405f800)
    at /usr/src/sys/net/iflib.c:3949
#28 0xffffffff80c55dad in gtaskqueue_run_locked (
    queue=queue@entry=0xfffff80003988800)
    at /usr/src/sys/kern/subr_gtaskqueue.c:371
#29 0xffffffff80c55a4c in gtaskqueue_thread_loop (arg=<optimized out>,
    arg@entry=0xfffffe00387e40b0) at /usr/src/sys/kern/subr_gtaskqueue.c:547
#30 0xffffffff80bc7c5e in fork_exit (
    callout=0xffffffff80c559a0 <gtaskqueue_thread_loop>,
    arg=0xfffffe00387e40b0, frame=0xfffffe00357a5b00)
    at /usr/src/sys/kern/kern_fork.c:1069
#31 <signal handler called>
(kgdb)
Comment 2 Hans Petter Selasky freebsd_committer 2021-04-02 18:06:29 UTC
This may be related to some known issues being worked on.

Are you using the "TCPHPTS" option in your kernel configuration file?

--HPS
Comment 3 Christos Chatzaras 2021-04-02 18:08:41 UTC
(In reply to Hans Petter Selasky from comment #2)

I use a GENERIC configuration file and didn't find anything related to TCPHPTS there.
Comment 4 Sergey V. Dyatko 2021-04-02 18:38:56 UTC
(In reply to Christos Chatzaras from comment #3)
It seems that with net.inet.tcp.sack.enable  disabled panics is gone
Comment 5 Hans Petter Selasky freebsd_committer 2021-04-02 18:39:13 UTC
This issue may look similar to:
https://reviews.freebsd.org/D29315

I see the fix is not in RC4 yet.

--HPS
Comment 6 Christos Chatzaras 2021-04-02 18:44:16 UTC
(In reply to Sergey V. Dyatko from comment #4)

"sysctl net.inet.tcp.sack.enable=0" is the first think I did after server came back online. I also disable LRO.

What version of FreeBSD you use now? I see you had the same or similar panic with 13-CURRENT 2 months ago.
Comment 7 Christos Chatzaras 2021-04-02 18:59:34 UTC
(In reply to Hans Petter Selasky from comment #5)

I am not wrong to land in releng/13.0 it needs permission from re@FreeBSD.org , and then @rscheff can commit it.
Comment 8 Richard Scheffenegger freebsd_committer 2021-04-02 19:12:16 UTC
https://reviews.freebsd.org/D18985 (RFC6675 SACK rescue retransmission) is not in 13.0-RC4, thus https://reviews.freebsd.org/D29315 (fixing a off-by one, NULL pointer free, introduced above) should not play a role here...

https://reviews.freebsd.org/D29083 could also lead to a panic in tcp_output, but should be in RC4... (checked in Mar08, RC4 was Mar29).

Why libkern.h:92 for tcp_do_segment in frame 11?

Any possibility to get the core? Or the full tcpcb of frame 11?
Comment 9 Christos Chatzaras 2021-04-02 19:25:46 UTC
(In reply to Richard Scheffenegger from comment #8)

Hello Richard,

I tried to contact you to rscheff at freebsd.org which forward the message to your private address and blocked: "A custom mail flow rule created by an admin at xxxxxxx.onmicrosoft.com has blocked your message.

Do you want to send you the vmcore (2.4GB) using wetransfer or you want to give you SSH access to a server and upload there the core? The server has /usr/src from RC4 and kgdb.
Comment 10 Richard Scheffenegger freebsd_committer 2021-04-02 22:12:17 UTC
received the core and kernel.debug symbols.

The panic is likely a off-by-one related to the FIN bit, but not with the
rescue retransmission, but rather PRR.

The TCP state indicates, that only the last data byte, and the final FIN bit
are unacknowledged. 

frame 11:
(kgdb) p *tp->sackhint.nexthole
$48 = {start = 935342315, end = 935342316, rxmit = 935342315, scblink = {tqe_next = 0x0, tqe_prev = 0xfffffe013eac0618}}
(kgdb) p tp->snd_max
$49 = 935342317
(kgdb) p tp->snd_una
$50 = 935342315

(kgdb) p/x tp->t_flags
$6 = 0x603003f4  ====> TF_SENTFIN is set.

Now, the FIN bit occupys the last Seq# (..317).
The SACK hole should therefore be a valid 1 byte hole, which hasn't been retransmitted...

The incoming SACK appears to be SACKing the FIN bit?

(kgdb) p/x *(struct sackblk *)to.to_sacks
$53 = {start = 0xec30c037, end = 0xed30c037}
(kgdb) p 0x37c030ec
$54 = 935342316
(kgdb) p 0x37c030ed
$55 = 935342317
Comment 11 Richard Scheffenegger freebsd_committer 2021-04-02 23:26:57 UTC
Two more question: Have you observed this type of panic once, or multiple times?

how are net.inet.tcp.rfc6675_pipe and net.inet.tcp.do_prr set?

If this is repeatable in your environment, you may want to reenable sack, but disable PRR (which is the new mechanism in 13). 

If this is repeatable, would you be willing to enable blackbox logging, or alternatively a packet capture of what leads up to this event?
Comment 12 Christos Chatzaras 2021-04-03 07:14:31 UTC
(In reply to Richard Scheffenegger from comment #11)

I have RC3-p1 / RC4 in 100+ servers. It happen only once, but only 5 days pass after the upgrade.

---------

sysctl values:

net.inet.tcp.rfc6675_pipe: 0

net.inet.tcp.do_prr: 1
Comment 13 Christos Chatzaras 2021-04-03 08:16:32 UTC
(In reply to Richard Scheffenegger from comment #11)

I enable SACK and LRO and disable PRR.

-----------

(In reply to Sergey V. Dyatko from comment #4)

In your kgdb output I see that PRR is mentioned:

https://lists.freebsd.org/pipermail/freebsd-current/2021-January/078553.html

Can you enable net.inet.tcp.sack.enable and disable net.inet.tcp.do_prr ? Also how often did this panic happened to your servers?
Comment 14 Richard Scheffenegger freebsd_committer 2021-04-05 09:03:36 UTC
Extracted a more complete set of packet headers belonging to the problematic session from the privately provided core.

The session is ECN-enabled

At the time of the panic, SENTFIN was set

Based on the Timestamp option of the incoming ACKs, serious reordering and spurious retransmissions were going on.

The final packet with FIN originally has a payload of 1 byte. (TSopt val ..5625), but that is apparently lost and not received by the client.

Susequently (based on TSopt val), just the FIN is retransmitted twice, with TSopt val ..5861 and ..5979 (e.g. when a transmission opportunity would be there, but no new data is available).

The RTT appears to be nearly 100ms from the very last round, sRTT is averaged at 275ms

At the panic, TSval would have been ..5988

This is for retransmitting the final payload byte, as the client only SACKed the 1st FIN retransmission. 

However, for some reason that byte is no longer available in the send socket buffer, causing the crash.


Srv -> clnt F. 9999:10000(1) //dropped
Clnt -> Srv E. 1:1(0) ack -26seg <sack -25seg:9999>
(unobserved retransmission Srv->Clnt)
Clnt -> Srv E. 1:1(0) ack 9999
Srv -> cnt F. 10000:10000(0)
Clnt -> Srv E. 1:1(0) ack 9999 <sack 10000:10001>
attempt to retransmit 10000:10001(1) -> crash.

However, current attempts to recreate this misbehavior were unsuccessful in recreating the panic.