Bug 273615 - infinite loop in sbflush_internal()
Summary: infinite loop in sbflush_internal()
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Gleb Smirnoff
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-09-07 14:29 UTC by Greg Becker
Modified: 2023-09-10 17:05 UTC (History)
3 users (show)

See Also:


Attachments
patch to prevent infinite loop in sbflush_internal() (1.45 KB, patch)
2023-09-07 14:29 UTC, Greg Becker
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Greg Becker 2023-09-07 14:29:56 UTC
Created attachment 244698 [details]
patch to prevent infinite loop in sbflush_internal()

If I run iperf3 in a loop over a 100gbe network (via cxgbe) eventually either the client or server gets stuck in sbflush_internal().  This is because sb->sb_ccc is an unsigned int that contains a value greater than INT_MAX.  sb->sb_ccc is cast to an int and passed into sbcut_internal(), which then fails to consume the data and returns NULL.  sbflush_internal() then keeps retrying the operation having made no progress.

There is a KASSERT in sbcut_internal() to catch this, but I'm not able to reproduce the problem with INVARIANTS enabled.  See commit b5b023b91eee30bacb28684c7aa467bf547f7c0d for additional information about this problem.

Note that I haven't yet been able to reproduce this on a 10gbe network.

I've attached a patch which addresses the immediate problem, but it seems that given the mixed usage of ssize_t, u_int, and int by callers of sbflush() and related interfaces that a larger scoped cleanup is desirable.  Note that when the bug arises, sb_ccc is always around 3.1-billion, which makes me wonder if my patch is merely a bandaid for a symptom of a deeper problem... ???

Here's the iperf3 loop I use to catch the problem:

while : ; do sudo iperf3 -c 172.16.10.200 --bidir || break; sleep 1; done
Comment 1 Greg Becker 2023-09-08 01:05:12 UTC
It turns out that if I remove all customizations from loader.conf and sysctl.conf I am unable to reproduce the problem.  However, if I make this single change (kern.ipc.maxsockbuf=614400000, as per Calomel's tuning guide) then the problem crops up very soon after on the iperf3 server side.

When it occurs, both sb_acc and sb_ccc are equal (typically around 3.4-billion).
Comment 2 Greg Becker 2023-09-08 11:33:19 UTC
I should also note that when the bug occurs on cxgbe it appears we start leaking mbuf_jumbo_page mbufs, but I don't see this with mlx4.  It's possible this is related to my patch, but that would imply there's a followon bug in sbcut_internal() and that seems unlikely...
Comment 3 Greg Becker 2023-09-10 17:05:44 UTC
Here's an interesting data point:  I am unable to trigger the bug for any value of kern.ipc.maxsockbuf less than or equal to 536862720 (i.e., 512M - 8K).
However, the bug is easily triggered for any value above that, even by adding just 1 (e.g., 536862721).

I arranged to call sbcheck() from sbcut_internal() if (len < 0), and it panics with the following message:


Unread portion of the kernel message buffer:
acc 3484065128/3484065128 ccc 3484065128/3484065128 mbcnt 4311772416/16805120
tlscc 0/0 dcc 0
panic: sbcheck from /usr/src/sys/kern/uipc_sockbuf.c:1557


Now, it's not clear to me if calling sbcheck() in this context is valid as it seems only to be called from functions in uipc_ktls.c.


FWIW, here's the dump of the socket buffer passed into sbcut_internal():

(kgdb) f 13
#13 0xffffffff80a083dc in sbcut_internal (sb=sb@entry=0xfffff8025ac44580, len=-810902168) at /usr/src/sys/kern/uipc_sockbuf.c:1557
1557			sbcheck(sb, __FILE__, __LINE__);

(kgdb) p *sb
$1 = {sb_sel = 0xfffff8025ac443e8, sb_state = 32, sb_flags = 2048, sb_acc = 3484065128, sb_ccc = 3484065128, sb_mbcnt = 16805120, 
  sb_ctl = 0, sb_hiwat = 2097152, sb_lowat = 1, sb_mbmax = 16777216, sb_timeo = 0, sb_upcall = 0x0, sb_upcallarg = 0x0, sb_aiojobq = {
    tqh_first = 0x0, tqh_last = 0xfffff8025ac445c0}, sb_aiotask = {ta_link = {stqe_next = 0x0}, ta_pending = 0, ta_priority = 0 '\000', 
    ta_flags = 0 '\000', ta_func = 0xffffffff809dd640 <soaio_rcv>, ta_context = 0xfffff8025ac443c0}, {{sb_mtx = 0xfffff8025ac44560, 
      sb_mb = 0xfffff80752644900, sb_mbtail = 0xfffff80752651900, sb_lastrecord = 0xfffff80752644900, sb_sndptr = 0x0, sb_fnrdy = 0x0, 
      sb_sndptroff = 0, sb_tlscc = 0, sb_tlsdcc = 0, sb_mtls = 0x0, sb_mtlstail = 0x0, sb_tls_seqno = 0, sb_tls_info = 0x0}, {uxdg_mb = {
        stqh_first = 0xfffff8025ac44560, stqh_last = 0xfffff80752644900}, uxdg_peeked = 0xfffff80752651900, {uxdg_conns = {
          tqh_first = 0xfffff80752644900, tqh_last = 0x0}, uxdg_clist = {tqe_next = 0xfffff80752644900, tqe_prev = 0x0}}, uxdg_cc = 0, 
      uxdg_ctl = 0, uxdg_mbcnt = 0}}}