273615 – infinite loop in sbflush_internal()

Bug 273615 - infinite loop in sbflush_internal()

Summary: infinite loop in sbflush_internal()

Status:	New

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	CURRENT
Hardware:	Any Any

Importance:	--- Affects Some People
Assignee:	Gleb Smirnoff

URL:
Keywords:

Depends on:
Blocks:

Reported:	2023-09-07 14:29 UTC by Greg Becker
Modified:	2023-09-10 17:05 UTC (History)
CC List:	3 users (show)

See Also:

Attachments
patch to prevent infinite loop in sbflush_internal() (1.45 KB, patch) 2023-09-07 14:29 UTC, Greg Becker	no flags	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Greg Becker 2023-09-07 14:29:56 UTC

Created attachment 244698 [details]
patch to prevent infinite loop in sbflush_internal()

If I run iperf3 in a loop over a 100gbe network (via cxgbe) eventually either the client or server gets stuck in sbflush_internal().  This is because sb->sb_ccc is an unsigned int that contains a value greater than INT_MAX.  sb->sb_ccc is cast to an int and passed into sbcut_internal(), which then fails to consume the data and returns NULL.  sbflush_internal() then keeps retrying the operation having made no progress.

There is a KASSERT in sbcut_internal() to catch this, but I'm not able to reproduce the problem with INVARIANTS enabled.  See commit b5b023b91eee30bacb28684c7aa467bf547f7c0d for additional information about this problem.

Note that I haven't yet been able to reproduce this on a 10gbe network.

I've attached a patch which addresses the immediate problem, but it seems that given the mixed usage of ssize_t, u_int, and int by callers of sbflush() and related interfaces that a larger scoped cleanup is desirable.  Note that when the bug arises, sb_ccc is always around 3.1-billion, which makes me wonder if my patch is merely a bandaid for a symptom of a deeper problem... ???

Here's the iperf3 loop I use to catch the problem:

while : ; do sudo iperf3 -c 172.16.10.200 --bidir || break; sleep 1; done

Comment 1 Greg Becker 2023-09-08 01:05:12 UTC

It turns out that if I remove all customizations from loader.conf and sysctl.conf I am unable to reproduce the problem.  However, if I make this single change (kern.ipc.maxsockbuf=614400000, as per Calomel's tuning guide) then the problem crops up very soon after on the iperf3 server side.

When it occurs, both sb_acc and sb_ccc are equal (typically around 3.4-billion).

Comment 2 Greg Becker 2023-09-08 11:33:19 UTC

I should also note that when the bug occurs on cxgbe it appears we start leaking mbuf_jumbo_page mbufs, but I don't see this with mlx4.  It's possible this is related to my patch, but that would imply there's a followon bug in sbcut_internal() and that seems unlikely...

Comment 3 Greg Becker 2023-09-10 17:05:44 UTC

Here's an interesting data point:  I am unable to trigger the bug for any value of kern.ipc.maxsockbuf less than or equal to 536862720 (i.e., 512M - 8K).
However, the bug is easily triggered for any value above that, even by adding just 1 (e.g., 536862721).

I arranged to call sbcheck() from sbcut_internal() if (len < 0), and it panics with the following message:


Unread portion of the kernel message buffer:
acc 3484065128/3484065128 ccc 3484065128/3484065128 mbcnt 4311772416/16805120
tlscc 0/0 dcc 0
panic: sbcheck from /usr/src/sys/kern/uipc_sockbuf.c:1557


Now, it's not clear to me if calling sbcheck() in this context is valid as it seems only to be called from functions in uipc_ktls.c.


FWIW, here's the dump of the socket buffer passed into sbcut_internal():

(kgdb) f 13
#13 0xffffffff80a083dc in sbcut_internal (sb=sb@entry=0xfffff8025ac44580, len=-810902168) at /usr/src/sys/kern/uipc_sockbuf.c:1557
1557			sbcheck(sb, __FILE__, __LINE__);

(kgdb) p *sb
$1 = {sb_sel = 0xfffff8025ac443e8, sb_state = 32, sb_flags = 2048, sb_acc = 3484065128, sb_ccc = 3484065128, sb_mbcnt = 16805120, 
  sb_ctl = 0, sb_hiwat = 2097152, sb_lowat = 1, sb_mbmax = 16777216, sb_timeo = 0, sb_upcall = 0x0, sb_upcallarg = 0x0, sb_aiojobq = {
    tqh_first = 0x0, tqh_last = 0xfffff8025ac445c0}, sb_aiotask = {ta_link = {stqe_next = 0x0}, ta_pending = 0, ta_priority = 0 '\000', 
    ta_flags = 0 '\000', ta_func = 0xffffffff809dd640 <soaio_rcv>, ta_context = 0xfffff8025ac443c0}, {{sb_mtx = 0xfffff8025ac44560, 
      sb_mb = 0xfffff80752644900, sb_mbtail = 0xfffff80752651900, sb_lastrecord = 0xfffff80752644900, sb_sndptr = 0x0, sb_fnrdy = 0x0, 
      sb_sndptroff = 0, sb_tlscc = 0, sb_tlsdcc = 0, sb_mtls = 0x0, sb_mtlstail = 0x0, sb_tls_seqno = 0, sb_tls_info = 0x0}, {uxdg_mb = {
        stqh_first = 0xfffff8025ac44560, stqh_last = 0xfffff80752644900}, uxdg_peeked = 0xfffff80752651900, {uxdg_conns = {
          tqh_first = 0xfffff80752644900, tqh_last = 0x0}, uxdg_clist = {tqe_next = 0xfffff80752644900, tqe_prev = 0x0}}, uxdg_cc = 0, 
      uxdg_ctl = 0, uxdg_mbcnt = 0}}}