Bug 253848

Summary:	[tcp] panic: sackhint bytes rtx >= 0
Product:	Base System	Reporter:	Richard Scheffenegger <rscheff>
Component:	kern	Assignee:	Richard Scheffenegger <rscheff>
Status:	Closed FIXED
Severity:	Affects Only Me	CC:	avg, transport, tuexen
Priority:	---
Version:	13.0-STABLE
Hardware:	Any
OS:	Any

Description Richard Scheffenegger freebsd_committer

2021-02-25 15:31:12 UTC

-----Ursprüngliche Nachricht-----
From: Andriy Gapon
To: FreeBSD Current <current@FreeBSD.org>; net@FreeBSD.org
Subject: panic: sackhint bytes rtx >= 0


Got this panic on 13.0-STABLE 4b2a20dfde9c using a custom kernel with INVARIANTS enabled.
Below is some information from the crash dump.
If anyone has any clues, suggestions, etc, please help.
I will try to help you to help me the best I can.

#0  doadump (textdump=textdump@entry=1)
    at /usr/devel/git/trant/sys/kern/kern_shutdown.c:399
#1  0xffffffff808396b2 in kern_reboot (howto=260)
    at /usr/devel/git/trant/sys/kern/kern_shutdown.c:486
#2  0xffffffff80839d07 in vpanic (
    fmt=0xffffffff80cbd551 "sackhint bytes rtx >= 0", ap=0xfffffe0120b9e6d0)
    at /usr/devel/git/trant/sys/kern/kern_shutdown.c:919
#3  0xffffffff808398b3 in panic (fmt=<unavailable>)
    at /usr/devel/git/trant/sys/kern/kern_shutdown.c:843
#4  0xffffffff8098a82c in tcp_sack_doack (tp=<optimized out>,
    tp@entry=0xfffff807548187f0, to=<optimized out>,
    to@entry=0xfffffe0120b9e780, th_ack=<optimized out>)
    at /usr/devel/git/trant/sys/netinet/tcp_sack.c:691
#5  0xffffffff80983699 in tcp_do_segment (m=0xfffff8029868ca00,
    m@entry=<error reading variable: value is not available>,
    th=<optimized out>,
    th@entry=<error reading variable: value is not available>,
    so=0xfffff804e7359b10,
    so@entry=<error reading variable: value is not available>,
    tp=0xfffff807548187f0,
    tp@entry=<error reading variable: value is not available>,
    drop_hdrlen=60,
    drop_hdrlen@entry=<error reading variable: value is not available>,
    tlen=<optimized out>,
    tlen@entry=<error reading variable: value is not available>,
    iptos=72 'H',
    iptos@entry=<error reading variable: value is not available>)
    at /usr/devel/git/trant/sys/netinet/tcp_input.c:2497
#6  0xffffffff80980d97 in tcp_input (mp=<optimized out>,
    mp@entry=<error reading variable: value is not available>,
    offp=<optimized out>,
    offp@entry=<error reading variable: value is not available>,
    proto=<optimized out>)
    at /usr/devel/git/trant/sys/netinet/tcp_input.c:1381
#7  0xffffffff80976eb7 in ip_input (m=0x0)
    at /usr/devel/git/trant/sys/netinet/ip_input.c:833
#8  0xffffffff8094c78f in netisr_dispatch_src (proto=1,
    source=source@entry=0, m=0xfffff8029868ca00)
    at /usr/devel/git/trant/sys/net/netisr.c:1143
#9  0xffffffff8094cb0e in netisr_dispatch (proto=<unavailable>,
    m=<unavailable>) at /usr/devel/git/trant/sys/net/netisr.c:1234
#10 0xffffffff80943345 in ether_demux (ifp=ifp@entry=0xfffff80008c75000,
    m=<unavailable>) at /usr/devel/git/trant/sys/net/if_ethersubr.c:923
#11 0xffffffff809446c1 in ether_input_internal (ifp=0xfffff80008c75000,
    m=<unavailable>) at /usr/devel/git/trant/sys/net/if_ethersubr.c:709
#12 0xffffffff809443d0 in ether_nh_input (m=<unavailable>)
    at /usr/devel/git/trant/sys/net/if_ethersubr.c:739
#13 0xffffffff8094c78f in netisr_dispatch_src (proto=proto@entry=5,
    source=source@entry=0, m=m@entry=0xfffff8029868ca00)
    at /usr/devel/git/trant/sys/net/netisr.c:1143
#14 0xffffffff8094cb0e in netisr_dispatch (proto=<unavailable>,
    proto@entry=5, m=<unavailable>, m@entry=0xfffff8029868ca00)
    at /usr/devel/git/trant/sys/net/netisr.c:1234
#15 0xffffffff8094383d in ether_input (ifp=0xfffff80008c75000,
    ifp@entry=<error reading variable: value is not available>,
    m=0xfffff8029868ca00,
    m@entry=<error reading variable: value is not available>)
    at /usr/devel/git/trant/sys/net/if_ethersubr.c:830
#16 0xffffffff8066da13 in re_rxeof (sc=sc@entry=0xfffffe00391a5000,
    rx_npktsp=0x0) at /usr/devel/git/trant/sys/dev/re/if_re.c:2388
#17 0xffffffff8066b3ab in re_intr_msi (xsc=0xfffffe00391a5000)
    at /usr/devel/git/trant/sys/dev/re/if_re.c:2684
#18 0xffffffff80801bcf in intr_event_execute_handlers (
    p=p@entry=0xfffff80006a94a50, ie=ie@entry=0xfffff80008c57800)
    at /usr/devel/git/trant/sys/kern/kern_intr.c:1168
#19 0xffffffff80801942 in ithread_execute_handlers (
    p=p@entry=0xfffff80006a94a50, ie=ie@entry=0xfffff80008c57800)
    at /usr/devel/git/trant/sys/kern/kern_intr.c:1181
#20 0xffffffff80801758 in ithread_loop (arg=arg@entry=0xfffff80008c20c40)
    at /usr/devel/git/trant/sys/kern/kern_intr.c:1269
#21 0xffffffff807fe0a9 in fork_exit (
    callout=0xffffffff808016c0 <ithread_loop>, arg=0xfffff80008c20c40,
    frame=0xfffffe0120b9ed40)
    at /usr/devel/git/trant/sys/kern/kern_fork.c:1069


(kgdb) fr 4
#4  0xffffffff8098a82c in tcp_sack_doack (tp=<optimized out>, tp@entry=0xfffff807548187f0, to=<optimized out>, to@entry=0xfffffe0120b9e780, th_ack=<optimized out>) at /usr/devel/git/trant/sys/netinet/tcp_sack.c:691
691                     KASSERT(tp->sackhint.sack_bytes_rexmit >= 0,
(kgdb) i loc
sack_blocks = {{start = 3846350860, end = 3846396940}, {start = 3846398380, end = 3846415660}, {start = 0, end = 0}, {start = 1417775088, end = 4294965255}, {start = 549054304, end = 4294966785}} left_edge_delta = <optimized out> delivered_data = 0 sack_changed = <optimized out> num_sack_blks = <optimized out> i = <optimized out> j = <optimized out> sack = {start = <optimized out>, end = <optimized out>} sblkp = 0xfffffe0120b9e700 temp = <optimized out> cur = 0xfffff8013da5a320
(kgdb) p *cur
$1 = {start = 3846347980, end = 3846352300, rxmit = 3846352300, scblink = {tqe_next = 0xfffff8013da5a220, tqe_prev = 0xfffff80754818930}}
(kgdb) p 3846352300 - 3846347980
$2 = 4320
(kgdb) p tp->sackhint.sack_bytes_rexmit
value has been optimized out
(kgdb) p tp@entry->sackhint.sack_bytes_rexmit
$3 = -1440
(kgdb) p 4320 - 1440
$4 = 2880
(kgdb) p *sblkp
$5 = {start = 3846350860, end = 3846396940}

(kgdb) p *tp@entry
$7 = {t_inpcb = 0xfffff80640ca65b8, t_fb = 0xffffffff80ef4700 <tcp_def_funcblk>, t_fb_ptr = 0x0, t_maxseg = 1440, t_logstate = 0, t_port = 0, t_state = 4, t_idle_reduce = 0, t_delayed_ack = 0, t_fin_is_rst = 0,
  t_log_state_set = 0, bits_spare = 0, t_flags = 1613759072, snd_una = 3846347980, snd_max = 3846417100, snd_nxt = 3846417100, snd_up = 3846347980, snd_wnd = 237568, snd_cwnd = 4320, t_peakrate_thr = 0, ts_offset = 71449215,
  rfbuf_ts = 162235772, rcv_numsacks = 0, t_tsomax = 0, t_tsomaxsegcount = 0, t_tsomaxsegsize = 0, rcv_nxt = 2281639092, rcv_adv = 2281705332, rcv_wnd = 66240, t_flags2 = 1030, t_srtt = 876, t_rttvar = 33, ts_recent = 0,
  snd_scale = 8 '\b', rcv_scale = 6 '\006', snd_limited = 2 '\002', request_r_scale = 6 '\006', last_ack_sent = 2281639092, t_rcvtime = 2309118641, rcv_up = 2281639092, t_segqlen = 0, t_segqmbuflen = 0, t_segq = {
    tqh_first = 0x0, tqh_last = 0xfffff80754818880}, t_in_pkt = 0x0, t_tail_pkt = 0x0, t_timers = 0xfffff80754818a78, t_vnet = 0x0, snd_ssthresh = 31680,
snd_wl1 = 2281639092, snd_wl2 = 3846347980, irs = 2281631223,
  iss = 3840447913, t_acktime = 0, t_sndtime = 2309118613, ts_recent_age = 0, snd_recover = 3846415660, cl4_spare = 0, t_oobflags = 0 '\000', t_iobc = 0 '\000', t_rxtcur = 270, t_rxtshift = 1, t_rtttime = 2309118613,
  t_rtseq = 3846415660, t_starttime = 2309086941, t_fbyte_in = 2309087188, t_fbyte_out = 2309087159, t_pmtud_saved_maxseg = 0, t_blackhole_enter = 0, t_blackhole_exit = 0, t_rttmin = 30, t_rttbest = 845, t_softerror = 0,
  max_sndwnd = 237568, snd_cwnd_prev = 64800, snd_ssthresh_prev = 8640, snd_recover_prev = 3846347980, t_sndzerowin = 0, t_rttupdated = 368, snd_numholes = 2, t_badrxtwin = 0, snd_holes = {tqh_first = 0xfffff8013da5a320,
    tqh_last = 0xfffff8013da5a230}, snd_fack = 3846415660, sackblks = {{start = 2281632180, end = 2281632690}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}, {start = 0, end = 0}},
  sackhint = {nexthole = 0xfffff8013da5a220, sack_bytes_rexmit = -1440, last_sack_ack = 3846415660, delivered_data = 1440, sacked_bytes = 61920, recover_fs = 67680, prr_delivered = 1440, _pad = {0}}, t_rttlow = 25,
  rfbuf_cnt = 0, tod = 0x0, t_sndrexmitpack = 520, t_rcvoopack = 0, t_toe = 0x0, cc_algo = 0xffffffff80ef2530 <newreno_cc_algo>, ccv = 0xfffff80754818bc0, osd = 0x0, t_bytes_acked = 11520, t_maxunacktime = 0, t_keepinit = 0,
  t_keepidle = 0, t_keepintvl = 0, t_keepcnt = 0, t_dupacks = 4, t_lognum = 0, t_loglimit = 0, t_pacing_rate = -1, t_logs = {stqh_first = 0x0, stqh_last = 0x0}, t_lin = 0x0, t_lib = 0x0, t_output_caller = 0x0, t_stats = 0x0,
  t_logsn = 0, gput_ts = 0, gput_seq = 0, gput_ack = 0, t_stats_gput_prev = 0, t_tfo_client_cookie_len = 0 '\000', t_end_info_status = 0, t_tfo_pending = 0x0, t_tfo_cookie = {client = '\000' <repeats 15 times>, server = 0}, {
    t_end_info_bytes = "\000\000\000\000\000\000\000", t_end_info = 0}}

(kgdb) p *tp@entry->sackhint.nexthole
$8 = {start = 3846396940, end = 3846398380, rxmit = 3846398380, scblink = {tqe_next = 0x0, tqe_prev = 0xfffff8013da5a330}}

--
Andriy Gapon

Comment 1 Richard Scheffenegger freebsd_committer

2021-02-25 15:33:14 UTC

Hi Andriy,

I guess I am currently the person who has the most recent knowledge about that part of the base stack...

Do you happen to have more (preceding) information about this, or a way to reproduce this?

Are you running any special stack (RACK, BBR) which may have switched back to the base stack in the middle of a loss recovery (I suspected at one point that this may cause issues, potentially)?

Or was something done with the ipfw that may have temporarily impacted a tcp session?


The accounting with sack_bytes_rexmit is rather old, and not touched recently (but the sackhint struct was changed recently, and other/additional scoreboard accounting was added).


(kgdb) p *cur
$1 = {start = 3846347980, end = 3846352300, rxmit = 3846352300, scblink = {tqe_next = 0xfffff8013da5a220, tqe_prev = 0xfffff80754818930}}

This indicates, that the current hole in the SACK scoreboard (3 segments of size 1440 bytes) were retransmitted  (rxmit == end), before the current acknowledgement came back.

Thus the expectation is, that sackhint.sack_bytes_rexmit also has a value of at least that number of bytes (4320). It is increased in tcp_output() for each packet leaving while performing a retransmission.

But this is the peculiar part:
(kgdb) p tp@entry->sackhint.sack_bytes_rexmit<mailto:tp@entry-%3esackhint.sack_bytes_rexmit>
$3 = -1440

Indicating negative one packet had been retransmitted before (thus subtracting the hole, which was previously retransmitted violates the invariant). And the only piece of code decrementing it appears to be in tcp_output() during non-permanent error handling...


All updates to sackhint should be protected by the INPLOCK, so even if the rx and tx paths are running on different core, the sack_bytes_rexmit should never become negative.


The sack blocks returned indicate that (with snd.una as zero baseline, in segments) the client knows about segments 2..34 and 35..47.

The first hole has shrunk from the right (unusual; possible when two retransmissions were lost again, or the 3 segment originally sent, delayed by ~50 segments (unlikely).


Sorry to not being able to spot something obvious right away...

Comment 2 Richard Scheffenegger freebsd_committer

2021-02-25 15:34:29 UTC

Some more peculiar observations:

T_dupacks is 4 .. so this appears to be the 2nd ack, where we should be responding with more data to.

At t_dupacks 3, SACK should have retransmitted only the first packet.

However, the two holes (sackhint.nexthole for the final one, and *cur for the previous one, which we know is the first, as snd_una == cur->start ) both claim to have been retransmitted already (->rxmit == ->end), for a total of 4 segments...

Comment 3 Richard Scheffenegger freebsd_committer

2021-02-25 15:35:00 UTC

One change somewhat related to SACK loss recovery, which was added to stable/13, was PRR (D18892).

Since early Dec, there was one reported instance of poor performance, and on closer inspection, there are a few possibilities, where unexpected negative values, or overflows can occur - but those "should" only expose themselves as packets either not sent when expected, or in larger bursts (3 instead of 2) packets, not a panic like you observed.

There are a few diffs to improve that behavior, which I would like to MFC to stable/13 and 13.0 in the coming days:

D28780 - harness to ensure PRR doesn't end with fully closed window (performance)
D28813 - fix an overflow and errors due to mixup of "min"/"imin" (implicit typecast)
(D28816) uncritical really, use updated pipe calculation
D28819 - fix unintended double-accounting
(D28822) enhanced heuristic, per standards-track draft of PRR

I can not immediately see, how that may affect the sack_bytes_rexmit numbers, but PRR is one change in the loss recovery space added recently - so the panic may be somehow related.

Comment 4 Richard Scheffenegger freebsd_committer

2021-02-25 15:39:14 UTC

In todays #transport call, we discussed shortly about this. There may be a 2nd instance of this kind of panic, thus documenting it as a PR.

Further, in order to allow the core to contain more information about past events (including TCP headers), if this happens in a reproducable way, the Blackbox Logging feature may sched more light as to how the variable ended up in this way.

Comment 5 commit-hook freebsd_committer

2021-03-05 17:19:10 UTC

A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=4a8f3aad37dd35c905e34b64b022d60844ba8d01

commit 4a8f3aad37dd35c905e34b64b022d60844ba8d01
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-03-05 16:45:23 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-03-05 16:45:54 +0000

    tcp: remove incorrect reset of SACK variable in PRR

    Reviewed By:   #transport, rrs, tuexen
    PR:            253848
    MFC after:     3 days
    Sponsored By:  NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D29083

 sys/netinet/tcp_input.c | 1 -
 1 file changed, 1 deletion(-)

Comment 6 Richard Scheffenegger freebsd_committer

2021-03-05 17:35:20 UTC

It turned out that only under a specific set of circumstances - RTO during a SACK loss recovery, while the client had not yet fully reported back all SACK blocks - a unnecessary initialization of sack_bytes_rexmit caused this panic. 

In mundane environments this was virtually not reproduceable, and even in exposed environments, tcp sessions apparently need to suffer from sudden RTT jumps / path changes to elicit the panic.

Thanks to rrs@ and gallatin@ for helping by providing crucial timing and packet trace/tcp state information around a few occurances.

Comment 7 commit-hook freebsd_committer

2021-03-08 11:22:34 UTC

A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=d90bba73a2e43fa12ea19425d101df4c488c1070

commit d90bba73a2e43fa12ea19425d101df4c488c1070
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-03-05 16:45:23 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-03-08 10:43:49 +0000

    tcp: remove incorrect reset of SACK variable in PRR

    Reviewed By:   #transport, rrs, tuexen
    PR:            253848
    MFC after:     3 days
    Sponsored By:  NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D29083

    (cherry picked from commit 4a8f3aad37dd35c905e34b64b022d60844ba8d01)

 sys/netinet/tcp_input.c | 1 -
 1 file changed, 1 deletion(-)

Comment 8 commit-hook freebsd_committer

2021-03-08 14:18:02 UTC

A commit in branch releng/13.0 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=bb60a68985c8902f4061f1e3338a74e5f7d36a5d

commit bb60a68985c8902f4061f1e3338a74e5f7d36a5d
Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
AuthorDate: 2021-03-05 16:45:23 +0000
Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
CommitDate: 2021-03-08 14:17:12 +0000

    tcp: remove incorrect reset of SACK variable in PRR
    tcp: Add prr_out in preparation for PRR/nonSACK and LRD

    Reviewed by:   #transport, rrs, tuexen, kbowling
    Approved by:   #re (gjb)
    PR:            253848
    MFC after:     3 days
    Sponsored by:  NetApp, Inc.
    Differential Revision: https://reviews.freebsd.org/D29083
    Differential Revision: https://reviews.freebsd.org/D29058

    (cherry picked from commit 4a8f3aad37dd35c905e34b64b022d60844ba8d01)
    (cherry picked from commit d90bba73a2e43fa12ea19425d101df4c488c1070)

    (cherry picked from commit e53138694aa41c24c17847afe959225ce0eeff91)
    (cherry picked from commit 71cc98ba735283b8720c9031eb87810f3d3d96a0)

 sys/netinet/tcp_input.c  | 22 ++++++++--------------
 sys/netinet/tcp_output.c |  8 ++++++++
 sys/netinet/tcp_var.h    |  2 +-
 3 files changed, 17 insertions(+), 15 deletions(-)