While running the internal tests we regularly get panic. panic: sbsndptr_noadv: sb_mb is NULL #1 doadump (textdump=textdump@entry=1) at /usr/kernel_git_main/sys/kern/kern_shutdown.c:403 #2 0xffffffff80b3de30 in kern_reboot (howto=260) at /usr/kernel_git_main/sys/kern/kern_shutdown.c:521 #3 0xffffffff80b3e333 in vpanic (fmt=0xffffffff8113018d "%s: sb_mb is NULL", ap=ap@entry=0xfffffe00e3afdb60) at /usr/kernel_git_main/sys/kern/kern_shutdown.c:973 #4 0xffffffff80b3e183 in panic (fmt=<unavailable>) at /usr/kernel_git_main/sys/kern/kern_shutdown.c:889 #5 0xffffffff80bede41 in sbsndptr_noadv (sb=<optimized out>, off=<optimized out>, moff=<unavailable>) at /usr/kernel_git_main/sys/kern/uipc_sockbuf.c:1694 #6 0xffffffff80d3ab4e in tcp_default_output (tp=0xfffff801cad14000, tp@entry=<error reading variable: value is not available>) at /usr/kernel_git_main/sys/netinet/tcp_output.c:1069 #7 0xffffffff80d4d17f in tcp_output_locked (tp=0xfffff801cad14000) at /usr/kernel_git_main/sys/netinet/tcp_timer.c:298 #8 tcp_timer_rexmt (tp=0xfffff801cad14000) at /usr/kernel_git_main/sys/netinet/tcp_timer.c:805 #9 0xffffffff80d4c71e in tcp_timer_enter (xtp=0xfffff801cad14000) at /usr/kernel_git_main/sys/netinet/tcp_timer.c:880 #10 0xffffffff80b5cd56 in softclock_call_cc (c=0xfffff801cad14198, cc=cc@entry=0xffffffff821e4300, direct=direct@entry=0) at /usr/kernel_git_main/sys/kern/kern_timeout.c:714 #11 0xffffffff80b5e376 in softclock_thread (arg=arg@entry=0xffffffff821e4300) at /usr/kernel_git_main/sys/kern/kern_timeout.c:846 #12 0xffffffff80af3b42 in fork_exit (callout=0xffffffff80b5e2b0 <softclock_thread>, arg=0xffffffff821e4300, frame=0xfffffe00e3afdf40) at /usr/kernel_git_main/sys/kern/kern_fork.c:1157 debug kernel and vmcore available, and we can reproduce the problem on damand
How can we access the core? How do you reproduce it?
Daniel, If you can deterministically reproduce this, that this is probably what Gleb was looking for. It is likely that this is an instance where the retransmit timer is active on a closed (or closing) session, where the socket buffers were already freed... The BT certainly looks like this - the open question is, why not all outstanding timers get cancelled when a session is closing - one codepath seems to be missing the relevant cleanups. (This appears to have been exposed by my recent change of not discarding the SACK scoreboard on an RTO - SACK retransmissions do re-arm the RTO timer.) If you are looking for a quick workaround, we have one, but would really appreciate your help into understanding the actual root cause (codepath where session closure is not properly cleaning up).
Its happening during IPsec testing by sending TCP traffic using iperf tool Please download the core from here and let me know when I can remove it from my drive https://drive.google.com/file/d/1LxoaCE9QoZuwGYKGFqKYeVt7ozVS4aXm/view?usp=sharing
Daniel, yes the core looks the same as several I already have. As Richard said we have workarounds against this panic and we know what commit opened it up. The commit itself isn't a mistake, though. Instead of committing any of our workarounds we are looking to find the root cause of the panic, a path that would flush socket send buffer but leave sack scoreboard. If you have any clue how reproduce the bug, please share. You mentioned certain iperf action/configuration.
Hi I was able to reproduce the bug by running the following test in a loop ~20 times test flow (ran it ~20 times till reproduced): run in loop 8 esp\IKE alg (twice): 1. set IPsec configuration 2. start ipsec - ipsec start 3. send IPerf stress traffic ~10 sec iperf params: -P 100 -t 20 --format g -p 17123 4. stop iperf traffic stop iperf traffic after ~10 sec - killall iperf 5. stop ipsec - ipsec stop 6. change IKE and esp alg 7. return to stage 1. IKE and ESP algorithms I am running in a loop 1. aes256-sha2_256-modp1024! 2. aes256-sha256-modp4096 3. aes256gcm16-prfsha384-ecp384 4. aes256gcm16-sha256-modp1024 5. aes128-sha2_256-modp1024! 6. aes128gcm16-prfsha256-ecp256 7. aes128-sha256-ecp256 8. aes128gcm16-sha256-modp1024 IPSEC version FreeBSD strongSwan U5.9.10/K15.0-CURRENT iperf version: 2.1.9
Last TCP / IPSec bug I looked at was an oversight in error handling (on the TCP side), when ip_output() would return ENOBUFS (which has a higher likelyhood apparently when IPSec is in play). Maybe something along these lines plays a role here as well (just speculating, didn't really check all the codepaths for closing a session, and what/if any potential error is always properly handled...)
Can I remove the core from my drive? Any update regarding the issue?
(In reply to Daniel from comment #8) No update from my side right now. Trying to reproduce it locally. But I copied your code, so you can remove it from your drive, I think.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=62d47d73b7eb01f3b0a37541df5e7aaa36f54335 commit 62d47d73b7eb01f3b0a37541df5e7aaa36f54335 Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2024-02-10 09:28:42 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2024-02-10 09:30:00 +0000 tcp: stop timers and clean scoreboard in tcp_close() Stop timers when in tcp_close() instead of doing that in tcp_discardcb(). A connection in CLOSED state shall not need any timers. Assert that no timer is rescheduled after that in tcp_timer_activate() and verfiy that this is also the expected state in tcp_discardcb(). PR: 276761 Reviewed By: glebius, tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43792 sys/netinet/tcp_subr.c | 4 ++-- sys/netinet/tcp_timer.c | 1 + 2 files changed, 3 insertions(+), 2 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=a8e817cf5c9c6e34357e0c078a256e2526b9da53 commit a8e817cf5c9c6e34357e0c078a256e2526b9da53 Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2024-02-10 09:24:10 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2024-02-10 09:25:02 +0000 tcp: stop doing superfluous work after sending RST When sending a RST control segment in tcp_output() it means we are in TCPS_CLOSED state, called from tcp_drop(). Once the RST is sent, don't call tcp_timer_activate() or update anything in tcpcb, since that will go away shortly. PR: 276761 Provided by: glebius Reviewed By: glebius, tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43808 sys/netinet/tcp_output.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=3eeb22cb819409b49296ecb0acbd453671168313 commit 3eeb22cb819409b49296ecb0acbd453671168313 Author: Richard Scheffenegger <rscheff@FreeBSD.org> AuthorDate: 2024-02-10 09:16:08 +0000 Commit: Richard Scheffenegger <rscheff@FreeBSD.org> CommitDate: 2024-02-10 09:20:00 +0000 tcp: clean scoreboard when releasing the socket buffer The SACK scoreboard is conceptually an extention of the socket buffer. Remove it when the socket buffer goes away with soisdisconnected(). Verify that this is also the expected state in tcp_discardcb(). PR: 276761 Reviewed by: glebius, tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43805 sys/netinet/tcp_input.c | 1 + sys/netinet/tcp_subr.c | 5 ++--- sys/netinet/tcp_timewait.c | 1 + sys/netinet/tcp_usrreq.c | 1 + 4 files changed, 5 insertions(+), 3 deletions(-)
Just a summary of our findings: Under rare circumstances, tcp_close() is called without being followed up quickly - prior of any timer going off - by tcp_discardcb(). In tcp_close() the socket buffers are relinquished, but the clean up of associated state would only be done later in tcp_discardcb(). The change exposing this was to no longer discard the SACK scoreboard immediately with a Retransmission timeout, in order to avoid unneccessary retransmissions in such an event. The SACK scoreboard is used for transmission selection during loss recovery or retransmission timeouts. Additionally, timers could be restarted after having sent a RST packet from tcp_drop() - while the session was already closed. In summary, it appears that there always existed the problem of inconsistent state in between tcp_close() and tcp_discardcb(), where a timer could trigger. However, as an RTO would previously clean the SACK scoreboard state, this would not lead to external visible misbehavior or a panic. The three commits address these issues individually - clean any SACK state together with releasing the socket buffer - stop re-arming any timers in the case of sending a RST (while in CLOSED state) - stop timers in tcp_close() instead of waiting for the final clean-up of tcpcb state in tcp_discardcb().
Created attachment 248358 [details] Panic stack Hi, I ran over the latest kernel (including the fix you pushed) and I got the following panic (please see attached file), may it be related to your fix?
(In reply to Daniel from comment #14) It is a consequence of https://cgit.FreeBSD.org/src/commit/?id=62d47d73b7eb01f3b0a37541df5e7aaa36f54335 We are aware of it, since it was catched by fuzz testing: https://syzkaller.appspot.com/bug?extid=9a9aa434a14a2b35c3ba and https://syzkaller.appspot.com/bug?extid=e82856782410e895bae7 However, we don't have a reproducer yet. Do you have an idea how to reproduce it?
Yes, I suspect that TT_2MSL is allowed to be active. Would you be willing to test these checks instead of a catch-all if any timer is active: diff --git a/sys/netinet/tcp_subr.c b/sys/netinet/tcp_subr.c index 90e1496a822c..8acb03d8e3d1 100644 --- a/sys/netinet/tcp_subr.c +++ b/sys/netinet/tcp_subr.c @@ -2383,7 +2383,11 @@ tcp_discardcb(struct tcpcb *tp) #endif INP_WLOCK_ASSERT(inp); - MPASS(!callout_active(&tp->t_callout)); + /* only allow TT_2MSL for TIMEWAIT */ + MPASS(!tcp_timer_active(tp, TT_REXMT)); + MPASS(!tcp_timer_active(tp, TT_PERSIST)); + MPASS(!tcp_timer_active(tp, TT_KEEP)); + MPASS(!tcp_timer_active(tp, TT_DELACK)); MPASS(TAILQ_EMPTY(&tp->snd_holes)); /* free the reassembly queue, if any */ Syzcaller found this, but without a reproducer (or core). The above, if it's another timer than TT_2MSL, which would be expected for TIME-WAIT, would yield a dedicated panic line...
I'm getting this panic whilst simply building ports in poudriere, probably from regular downloading of source tarballs from IPv4 hosts. Have had to revert this locally to keep things moving.
*** Bug 277009 has been marked as a duplicate of this bug. ***
(In reply to Charlie Li from comment #17) To clarify, specifically base 62d47d73b7eb01f3b0a37541df5e7aaa36f54335 / review D43792; the other commits seem fine.
Daniel since the partial revert are you still seeing these errors?
The root cause was addressed with commit https://reviews.freebsd.org/rGe34ea0196f4497d3f3939025aff3a89ee77b652e