Bug 293127 - Kernel panic __mtx_lock_sleep due to nfsd?
Summary: Kernel panic __mtx_lock_sleep due to nfsd?
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 15.0-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: Rick Macklem
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2026-02-12 01:47 UTC by yuan.mei
Modified: 2026-04-09 22:45 UTC (History)
6 users (show)

See Also:


Attachments
kgdb session log (778.57 KB, text/plain)
2026-02-12 01:47 UTC, yuan.mei
no flags Details
core dump log txt (108.83 KB, text/plain)
2026-02-22 21:15 UTC, yuan.mei
no flags Details
svc_dg.c: Take an additional refcnt for the socket (888 bytes, patch)
2026-02-28 22:26 UTC, Rick Macklem
no flags Details | Diff
core.txt.6 (105.54 KB, text/plain)
2026-03-08 17:30 UTC, yuan.mei
no flags Details
core.txt.8 (98.24 KB, text/plain)
2026-03-15 06:35 UTC, yuan.mei
no flags Details
core.txt.9 (105.35 KB, text/plain)
2026-03-15 06:37 UTC, yuan.mei
no flags Details
tcp_usrreq.c: Re-introduce a check for INP_DROPPED (583 bytes, patch)
2026-03-15 13:48 UTC, Rick Macklem
no flags Details | Diff
core.txt.0 (97.21 KB, text/plain)
2026-03-19 04:37 UTC, yuan.mei
no flags Details
svc.c: Fix a race with svc_checkidle() (918 bytes, patch)
2026-03-20 22:31 UTC, Rick Macklem
no flags Details | Diff
core.txt.3 (99.93 KB, text/plain)
2026-03-26 19:58 UTC, yuan.mei
no flags Details
log of crash with debug output (997.55 KB, text/plain)
2026-03-29 18:30 UTC, Lenore Gilbert
no flags Details
kernel log with debug output (19.24 KB, text/plain)
2026-03-31 15:17 UTC, Dillon Kass
no flags Details
Core.txt going with dgilbert's post. (462.53 KB, text/plain)
2026-04-08 19:56 UTC, David Gilbert
no flags Details
net.patch (5.52 KB, patch)
2026-04-09 18:47 UTC, yuan.mei
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description yuan.mei 2026-02-12 01:47:35 UTC
Created attachment 267991 [details]
kgdb session log

$ uname -a
FreeBSD maynas 15.0-STABLE FreeBSD 15.0-STABLE stable/15-n281999-4d36d18253b0 MAYNASKERNEL amd64

Feb 11 09:24:26 maynas kernel: Fatal trap 12: page fault while in kernel mode
Feb 11 09:24:26 maynas kernel: cpuid = 7; apic id = 0e
Feb 11 09:24:26 maynas kernel: fault virtual address    = 0x488
Feb 11 09:24:26 maynas kernel: fault code               = supervisor read data, page not present
Feb 11 09:24:26 maynas kernel: instruction pointer      = 0x20:0xffffffff80bb65d9
Feb 11 09:24:26 maynas kernel: stack pointer            = 0x28:0xfffffe00d5c86470
Feb 11 09:24:26 maynas kernel: frame pointer            = 0x28:0xfffffe00d5c864f0
Feb 11 09:24:26 maynas kernel: code segment             = base rx0, limit 0xfffff, type 0x1b
Feb 11 09:24:27 maynas kernel:                  = DPL 0, pres 1, long 1, def32 0, gran 1
Feb 11 09:24:27 maynas kernel: processor eflags = interrupt enabled, resume, IOPL = 0
Feb 11 09:24:27 maynas kernel: current process          = 2316 (nfsd: master)
Feb 11 09:24:27 maynas kernel: rdi: fffff80053f5f9f8 rsi: 0000000000000004 rdx: 0000000000000000
Feb 11 09:24:27 maynas kernel: rcx: fffff80053f5f9e0  r8: fffffe00d5c86590  r9: fffffe00d5c865a4
Feb 11 09:24:27 maynas kernel: rax: 0000000000000000 rbx: fffffe00d5c86500 rbp: fffffe00d5c864f0
Feb 11 09:32:00 maynas syslogd: kernel boot file is /boot/kernel/kernel
Feb 11 09:32:00 maynas kernel: ---<<BOOT>>---

This bug causes the system to reboot every 3 days or so.  It appears that the trap is caused by nfsd.  I keep nfsd on with the following configuration:

rpcbind_enable="YES"
nfs_server_enable="YES"
mountd_flags="-l -r"
mountd_enable="YES"
nfs_client_enable="YES"
rpc_lockd_enable="YES"
rpc_statd_enable="YES"
autofs_enable="YES"

But the trap condition persists regardless of whether there is a client or not.

I also attach a kgdb session log.
Comment 1 Gleb Smirnoff freebsd_committer freebsd_triage 2026-02-20 17:34:40 UTC
Backtrace cleansed of non-printable characters:
(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xffffffff80bdad89 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:519
#3  0xffffffff80bdb297 in vpanic (fmt=0xffffffff8125779c "%s",
    ap=ap@entry=0xfffffe00d5c86330) at /usr/src/sys/kern/kern_shutdown.c:974
#4  0xffffffff80bdb0c3 in panic (fmt=<unavailable>)
    at /usr/src/sys/kern/kern_shutdown.c:887
#5  0xffffffff810f1f4f in trap_fatal (frame=<optimized out>,
    eva=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:969
#6  0xffffffff810f1f4f in trap_pfault (frame=0xfffffe00d5c863b0,
    usermode=false, signo=<optimized out>, ucode=<optimized out>)
#7  <signal handler called>
#8  __mtx_lock_sleep (c=c@entry=0xfffff80053f5f9f8, v=<optimized out>)
    at /usr/src/sys/kern/kern_mutex.c:614
#9  0xffffffff80ef965d in svc_vc_recv (xprt=0xfffff80067c6e000,
    msg=0xfffffe00d5c865f0, addrp=0xfffff80035f82080, mp=0xfffffe00d5c866a8)
    at /usr/src/sys/rpc/svc_vc.c:835
#10 0xffffffff80ef6252 in svc_getreq (xprt=0xfffff80067c6e000,
    rqstp_ret=<optimized out>) at /usr/src/sys/rpc/svc.c:935
#11 svc_run_internal (grp=grp@entry=0xfffff800084dc100,
    ismaster=ismaster@entry=1) at /usr/src/sys/rpc/svc.c:1279
#12 0xffffffff80ef5ca7 in svc_run (pool=0xfffff800084dc000)
    at /usr/src/sys/rpc/svc.c:1408
#13 0xffffffff80adc986 in nfsrvd_nfsd (td=td@entry=0xfffff8001a326780,
    args=args@entry=0xfffffe00d5c869a0)
    at /usr/src/sys/fs/nfsserver/nfs_nfsdkrpc.c:641
#14 0xffffffff80af9808 in nfssvc_nfsd (td=0xfffff8001a326780,
    uap=<optimized out>) at /usr/src/sys/fs/nfsserver/nfs_nfsdport.c:4102
#15 0xffffffff80e47d18 in sys_nfssvc (td=<optimized out>, uap=<optimized out>)
    at /usr/src/sys/nfs/nfs_nfssvc.c:107
#16 0xffffffff810f2886 in syscallenter (td=0xfffff8001a326780)
    at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:193
#17 amd64_syscall (td=0xfffff8001a326780, traced=0)
    at /usr/src/sys/amd64/amd64/trap.c:1208
Comment 2 Mark Johnston freebsd_committer freebsd_triage 2026-02-20 17:41:43 UTC
This looks similar to 292884.  If you're able to test a patch, there's one attached to that report which may help.
Comment 3 Rick Macklem freebsd_committer freebsd_triage 2026-02-21 00:05:45 UTC
(In reply to Mark Johnston from comment #2)
Yep. Looks the same to me.
So Gleb, do you use bridge and/or epair on
the system that crashes.

If you look at 292884, you'll see that something
is deref'ng the socket prematurely.
This appears to have shown up in 15.0.

You can try the patch, which acquires an extra
reference on the socket. (It shouldn't be necessary,
but unless someone can figure out what is deref'ng the
socket, it should be an adequate workaround.)
Comment 4 Gleb Smirnoff freebsd_committer freebsd_triage 2026-02-21 01:32:53 UTC
Rick, this isn't my panic. I just pulled the backtrace into the bug track out from the attachment, where it was slightly mangled.
Comment 5 Rick Macklem freebsd_committer freebsd_triage 2026-02-21 01:58:48 UTC
(In reply to Gleb Smirnoff from comment #4)
Oops. So hopefully the reporter will try the
patch.

Also, Yuan Mei, do you use either a bridge
or epair in your network configuration?

And you can look at Bugzilla PR#292884,
where you will find a patch that might stop
the crashes. Please try the patch to confirm
if it is the same problem.

The patch is harmless, but it doesn't fix the
underlying problem, which might be caused by some
change to the bridge or epair drivers that makes
sockets get deref'd prematurely.
Comment 6 yuan.mei 2026-02-22 21:15:03 UTC
Created attachment 268276 [details]
core dump log txt
Comment 7 yuan.mei 2026-02-22 21:20:28 UTC
(In reply to Rick Macklem from comment #5)
I tried the patch in PR#292884 but it didn't stop the crash.  See the new attachment #268276 [details]

There no bridge or epair in the network configuration.
Comment 8 Rick Macklem freebsd_committer freebsd_triage 2026-02-22 22:04:12 UTC
(In reply to yuan.mei from comment #7)
Well, the crash looks like it is caused by the
same underlying bug somewhere in socket handling
or network fabric as 292884.

Since the patch doesn't stop the crashes, that
suggests that it isn't just a socket refcnt problem.

Can you give us everything you can think of
related to your network configuration.
(We now have three people reporting this, but we
do not know what is common between them that is
causing this.)

Note that this krpc code has not changed in at
least a decade, so I cannot see how the bug could
be in the krpc code.
--> All we seem to know is that it was introduced
    in FreeBSD 15.
Comment 9 Mark Johnston freebsd_committer freebsd_triage 2026-02-22 22:30:29 UTC
Is it possible to test a GENERIC-DEBUG kernel on this system?  Your kernel doesn't have any debugging assertions enabled, so turning those on might make it easier to see what's going on.
Comment 10 Rick Macklem freebsd_committer freebsd_triage 2026-02-23 01:08:54 UTC
(In reply to yuan.mei from comment #7)
This crash is very similar, but occurs in svc_dg.c
(the UDP handling code).

Could you try adding
nfs_server_flags="-t"
to your /etc/rc.conf when
running a kernel with the patch from PR#292884.
(I am wondering if enabling UDP allows the UDP
socket to somehow "go away" prematurely, as well.)
Comment 11 yuan.mei 2026-02-25 00:56:22 UTC
(In reply to Mark Johnston from comment #9)
I ran GENERIC-DEBUG kernel for a few days.  The machine went into a deadlock state (had to hard reset) and did not generate a dump.  However, in an old dmesg save, there's the following message.  Not sure if this is relevant.

lock order reversal:
 1st 0xfffffe00c4489a00 tcphash (tcphash, sleep mutex) @ /usr/src/sys/netinet/tcp_usrreq.c:1529
 2nd 0xffffffff822e8ae0 in6_ifaddr_lock (in6_ifaddr_lock, rm) @ /usr/src/sys/netinet6/in6_src.c:293
lock order tcphash -> in6_ifaddr_lock attempted at:
#0 0xffffffff80c0eb21 at witness_checkorder+0xbe1
#1 0xffffffff80b8dc79 at _rm_rlock_debug+0x129
#2 0xffffffff80de61fd at in6_selectsrc+0x3fd
#3 0xffffffff80de5d9d at in6_selectsrc_socket+0x6d
#4 0xffffffff80de3861 at in6_pcbconnect+0x291
#5 0xffffffff80dc6877 at tcp6_connect+0xb7
#6 0xffffffff80dc41f5 at tcp6_usr_connect+0x2f5
#7 0xffffffff80c4f500 at soconnectat+0xc0
#8 0xffffffff80c577d1 at kern_connectat+0xe1
#9 0xffffffff80c576c1 at sys_connect+0x81
#10 0xffffffff810eb979 at amd64_syscall+0x169
#11 0xffffffff810bd36b at fast_syscall_common+0xf8
Limiting icmp ping response from 193 to 192 packets/sec


The machine has been rebooted and running the GENERIC-DEBUG kernel again.  Will see what happens.
Comment 12 Mark Johnston freebsd_committer freebsd_triage 2026-02-25 00:58:26 UTC
(In reply to yuan.mei from comment #11)
Thank you.  I'm pretty sure that warning is unrelated.
Comment 13 yuan.mei 2026-02-25 01:07:18 UTC
(In reply to Mark Johnston from comment #12)
OK.  Going through /var/log/messages, I can see when the machine went into deadlock

Feb 24 11:39:59 maynas kernel: Fatal trap 9: general protection fault while in k
ernel mode
Feb 24 11:39:59 maynas kernel: cpuid = 6; apic id = 0c
Feb 24 11:39:59 maynas kernel: instruction pointer      = 0x20:0xffffffff80edc97
9
Feb 24 11:39:59 maynas kernel: stack pointer            = 0x28:0xfffffe00d6f875d
0
Feb 24 11:39:59 maynas kernel: frame pointer            = 0x28:0xfffffe00d6f875d0
Feb 24 11:39:59 maynas kernel: code segment             = base rx0, limit 0xfffff, type 0x1b
Feb 24 11:39:59 maynas kernel:                  = DPL 0, pres 1, long 1, def32 0, gran 1
Feb 24 11:39:59 maynas kernel: processor eflags = interrupt enabled, resume, IOPL = 0
Feb 24 11:39:59 maynas kernel: current process          = 2270 (nfsd: master)
Feb 24 11:39:59 maynas kernel: rdi: fffff80006593e00 rsi: fffff8002933c640 rdx: deadc0dedeadc0de
Feb 24 11:39:59 maynas kernel: rcx: fffff80229388400  r8: fffff8002f1577f8  r9: 0000000001a10010
Feb 24 11:39:59 maynas kernel: rax: 0000000000000002 rbx: fffff80006593e00 rbp: fffffe00d6f875d0
eb 24 11:39:59 maynas kernel: r10: 0000000000000000 r11: 0000000000000001 r12: fffff800291b8a50

Surely it is still triggered by nfsd, but unfortunately, there's no dump.  For this run, there's no nfs client at all.
Comment 14 Mark Johnston freebsd_committer freebsd_triage 2026-02-25 01:18:29 UTC
(In reply to yuan.mei from comment #13)
oh, interesting. Is the a stack trace?

An nfsd thread faulted on 0xdeadc0de, which implies a use-after-free somewhere.  If you're willing to try it, you could test a GENERIC-KASAN kernel, that should help catch the UAF as it happens.  Note that such kernels are quite slow, and aren't very well-tested on hardware, so you might run into unrelated problems.
Comment 15 Rick Macklem freebsd_committer freebsd_triage 2026-02-28 22:26:34 UTC
Created attachment 268432 [details]
svc_dg.c: Take an additional refcnt for the socket

This patch does what the patch in PR#292884 does
for TCP sockets, except for UDP sockets.
I know you said that the patch in PR#292884 did
not stop the crashes, but at least one crash showed
a failure w.r.t. a UDP socket, so I am hoping that
this patch plus the one in PR#292884 will stop the
crashes?

If you can test with both this patch and the one in
PR#292884 applied, that would be appreciated.
(Setting nfs_server_flags="t" so it is not using any
UDP sockets should also have the same effect as this
patch, if I am correct in my assumption that this is
related to the sockets being sorele()'d prematurely.)
Comment 16 yuan.mei 2026-03-01 00:48:21 UTC
(In reply to Rick Macklem from comment #15)
Running with both patches applied now.

In the past week I ran GENERIC-DEBUG twice but the machine entered a deadlocked state both times (had to hard reset).  No idea why.  This time, I just enabled

makeoptions     DEBUG=-g
options         KDB
options         GDB

Hopefully symbols become available in the dump should a crash still happen.
Comment 17 yuan.mei 2026-03-08 17:30:48 UTC
Created attachment 268635 [details]
core.txt.6
Comment 18 yuan.mei 2026-03-08 17:37:52 UTC
(In reply to Rick Macklem from comment #15)
I applied both patches yet the crash still happens.  See the crash dump log attachment #268635 [details]

In order to get this, kernel options KDB and GDB must not be set.  Otherwise, the machine would seize up without giving a dump or rebooting.
Comment 19 Rick Macklem freebsd_committer freebsd_triage 2026-03-10 23:24:44 UTC
(In reply to yuan.mei from comment #18)
This crash is somewhat different.
(It looks like it might be because so_vnet
isn't set correctly?)

I have a couple of questions..
- Does your kernel configuration have
  options VIMAGE
  in it?

- Are you running the nfsd in any jails?
Comment 20 yuan.mei 2026-03-11 02:04:33 UTC
(In reply to Rick Macklem from comment #19)
No, I have neither.

My kernel config is:

include         GENERIC
ident           MAYNASKERNEL

options         DUMMYNET
options         KDTRACE_FRAME

device          netmap
device          cxgbe
Comment 21 Rick Macklem freebsd_committer freebsd_triage 2026-03-11 02:51:28 UTC
(In reply to yuan.mei from comment #20)
Since you include GENERIC, you do have VIMAGE.

I was actually hoping that you didn't have VIMAGE,
since that doesn't get tested much.

Still a mystery...
Comment 22 Gleb Smirnoff freebsd_committer freebsd_triage 2026-03-11 03:02:30 UTC
Yuan, if you don't use any VIMAGE features, can you please disable it in your kernel config and try to reproduce the panic? Add this to your kernel config:

nooptions VIMAGE

Is it possible for you to write the kernel core and share it? You can put the core somewhere available for download, but to prevent sharing any sensitive information you can encrypt it with PGP key, that is available here: https://docs.freebsd.org/en/articles/pgpkeys/#_gleb_smirnoff_glebiusfreebsd_org
Comment 23 Rick Macklem freebsd_committer freebsd_triage 2026-03-12 23:04:56 UTC
(In reply to Rick Macklem from comment #21)
I noticed that the patch in PR#292884 did not
set the current vnet before calling sorele().

This might have caused your most recent crash.

I have updated the patch in PR#292884 with
attachment #268761 [details], which is fixed for this.

Maybe you could try using this patch instead of
the one you already tried from PR#29288?
Comment 24 yuan.mei 2026-03-13 00:46:09 UTC
(In reply to Rick Macklem from comment #23)
OK, I see the difference is

			CURVNET_SET(xprt->xp_socket->so_vnet);
			sorele(xprt->xp_socket);
			CURVNET_RESTORE();

Right now I'm running the previous patch and with

nooptions VIMAGE

suggested in comment #22.  It has not crashed yet.  After that test is done, I'll try this latest patch.

But to be sure: should I enable VIMAGE or not here?
Comment 25 Rick Macklem freebsd_committer freebsd_triage 2026-03-13 01:10:20 UTC
(In reply to yuan.mei from comment #24)
To test this patch, you do need to set
options VIMAGE. (CURVNET_SET()/CURVNET_RESTORE()
are no ops unless "options VIMAGE" is specified.

Thanks for doing this testing, rick
Comment 26 yuan.mei 2026-03-15 06:35:39 UTC
Created attachment 268814 [details]
core.txt.8

(In reply to Rick Macklem from comment #22)

With nooptions VIMAGE, the crash and reboot still happened.
Comment 27 yuan.mei 2026-03-15 06:37:35 UTC
Created attachment 268815 [details]
core.txt.9

(In reply to Rick Macklem from comment #22)

With nooptions VIMAGE, crash and rebooted again.

Next, I am re-enabling VIMAGE and try the patch in attachment #268761 [details] to see what happens.
Comment 28 Rick Macklem freebsd_committer freebsd_triage 2026-03-15 13:48:42 UTC
Created attachment 268822 [details]
tcp_usrreq.c: Re-introduce a check for INP_DROPPED

I spotted this yesterday and haven't heard
back from the networking folk yet, but a
commit at around 15.0 dropped a check for
INP_DROPPED in tcp_usr_shutdown().  I think
this check is needed, since tcp_userclosed()
can set INP_DROPPED.

This simple patch puts the check back in.

Please apply it, either instead of the patch
from PR#292884 or along with it.
(There is a chance this is the underlying cause,
since I think that, without the patch, tcp_close()
can be called multiple times.)
Comment 29 Rick Macklem freebsd_committer freebsd_triage 2026-03-15 13:51:09 UTC
(In reply to yuan.mei from comment #27)
Since CURVNET_SET() is a no-op without
VIMAGE, I doubt testing this case will
help.

Please try the patch I just attached as
tcp_usrreq.c: Re-introduce a check for INP_DROPPED

Thanks for doing all these tests, since I
do not know how to reproduce the problem.
Comment 30 yuan.mei 2026-03-19 04:37:24 UTC
Created attachment 268917 [details]
core.txt.0

(In reply to Rick Macklem from comment #28)

I applied all 3 patches simultaneously.  But the crash still occurs.  See the latest core dump .0 attached.
Comment 31 Rick Macklem freebsd_committer freebsd_triage 2026-03-20 22:31:35 UTC
Created attachment 268963 [details]
svc.c: Fix a race with svc_checkidle()

Here's another patch you can try.
It fixes a race against svc_checkidle().

I'll admit I have not come up with an
explanation w.r.t. how this race could
cause serious problems like your crashes.
(This race has been in the code for a long
time and has not caused problems, afaik.)
Comment 32 yuan.mei 2026-03-25 20:51:56 UTC
(In reply to Rick Macklem from comment #31)
With all patches applied, this time it's been running for 5 days.  The longest ever without crash.  But I haven't seen anything related to

printf("svc_run_internal...

in the log.  Will continue to monitor.
Comment 33 Lenore Gilbert 2026-03-26 16:58:50 UTC
Just wanted to add that I've also been seeing this crash every ~24h on the latest patchlevel of 15.0-RELEASE.

Some things of note:
* I'm using NFSv4
* There is a single client of this NFS share, which is an Ubuntu Linux client
* This worked without stability issues in 13.x-RELEASE and 14.x-RELEASE prior to upgrading to 15.0-RELEASE

I first tried the patch in a similar bug here with no change in behavior:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=292884

However, after applying the 3 patches from this bug (in addition to the patch in the bug linked above) I've now had 48h of stability for the first time since updating to 15.0-RELEASE.

Let me know if there is any additional information I can provide and thanks for the fix!
Comment 34 yuan.mei 2026-03-26 19:58:04 UTC
Created attachment 269133 [details]
core.txt.3

(In reply to Rick Macklem from comment #31)

Unfortunately, the crash still happened.  And I didn't see any print svc in the kernel log messages
Comment 35 Lenore Gilbert 2026-03-27 00:51:44 UTC
(In reply to yuan.mei from comment #34)
I had written my prior comment too soon...

I kernel paniced again at the 50h mark.

Something else I wanted to mention is that I'm using zfsshare for the NFSv4 exports in question.

Given that I can't keep this system from crashing very frequently this is a blocker for using NFS in my environment currently.

Please let me know if there is anything else I can test or assist with.

Warmly,
Lenore
Comment 36 Rick Macklem freebsd_committer freebsd_triage 2026-03-27 23:02:51 UTC
(In reply to Lenore Gilbert from comment #35)
At this time, the only known workaround is
falling back to FreeBSD-14 (and I'm not sure
if an up-to-date FreeBSD-14 system is also
susseptible?).

Hopefully someone can run with attachment #269151 [details]
from PR#292884 and get a crash, since the stuff
logged might provide useful information.
Comment 37 Lenore Gilbert 2026-03-27 23:26:49 UTC
(In reply to Rick Macklem from comment #36)
I've applied all the debugging from the attachment to the crashing machine and can confirm seeing the added debug output in the log already after boot and some basic NFS usage. Things are working for now, once they crash again I'll grab the logs and attach here.

Thanks Rick!
Comment 38 Lenore Gilbert 2026-03-29 18:30:05 UTC
Created attachment 269209 [details]
log of crash with debug output

Most recent 1k of logs leading up to panic with requested debug output/printfs merged. 

Includes all patches from this thread and 292884.
Comment 39 Lenore Gilbert 2026-03-29 18:39:34 UTC
(In reply to Rick Macklem from comment #36)
I just attached the requested log! This was the most recent 1k leading up to and including the panic, I can try to provide a complete log since boot but it will be several MB compressed and probably 40MB+ uncompressed, let me know if you need this.

Also saw your request for additional information in 292884 so putting this below as well. Let me know if there is anything else you need. Thanks!

SERVER:
root@<REDACTED>:/home/<REDACTED> # cat /etc/rc.conf
sshd_enable="YES"
ntpd_enable="YES"
powerd_enable="YES"
dumpdev="AUTO"
zfs_enable="YES"
linux_enable="YES"
sendmail_enable="NONE"
#fuse_enable="YES"
named_enable="YES"
ezjail_enable="YES"
ntpdate_flags="pool.ntp.org"
ntpdate_enable="YES"
gateway_enable="YES"
miniupnpd_enable="YES"

dhcpd_enable="YES"
pf_enable="YES"
pflog_enable="YES"
mountd_flags="-r -n"
mountd_enable="YES"
powerd_enable="YES"

hostname="<REDACTED>"

ifconfig_ix0="dhcp"

ifconfig_ix1="inet 10.0.0.1 netmask 255.255.255.0"
ifconfig_ix1_alias0="inet <REDACTED> netmask 255.255.255.255"
ifconfig_ix1_alias1="inet <REDACTED> netmask 255.255.255.255"
ifconfig_ix1_alias2="inet <REDACTED> netmask 255.255.255.255"
ifconfig_ix1_alias3="inet <REDACTED> netmask 255.255.255.255"
ifconfig_ix1_alias4="inet <REDACTED> netmask 255.255.255.255"

smartd_enable="YES"
sshd_flags="-o UseBlacklist=yes"
blacklistd_enable="YES"
blacklistd_flags="-r"

#NFS V4 ZFS
nfs_server_enable="YES"
nfsv4_server_enable="YES"
nfsuserd_enable="YES"

CLIENT:
root@<REDACTED>:/home/<REDACTED># nfsstat -m
/fs-2025-2/data from 10.0.0.1:/fs-2025-2/data
 Flags: rw,relatime,vers=4.2,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=<REDACTED>,local_lock=none,addr=10.0.0.1

NOTES:
- Do your clients mount/umount frequently? 
-- No, they are servers that mount automatically on boot.
- Do your clients use an automounter?
-- Clients mount with /etc/fstab
- Do you get messages like "nfsrv_cache_session: no session..".
-- None seen

Client is Ubuntu Linux and is the only client of this NFS share. I noticed in the logs that the actual crash happened about 20 minutes after much heaver NFS access from the client during some lighter access in a relatively "quiet" period.

Server is using NFS shares via zfs share with only a basic global root v4 export in /etc/exports.
Comment 40 Rick Macklem freebsd_committer freebsd_triage 2026-03-29 20:40:26 UTC
(In reply to Lenore Gilbert from comment #39)
So, the last lines of the log definitely show
a problem.
The bad news is that the problem is somewhere
else in the network fabric.

Here's the log entries:
Mar 29 01:51:54 citadel kernel: svc_vc_recv: so=0xfffff80181bbe800 src=2
Mar 29 01:51:54 citadel kernel: svc_vc_recv: aft err=0 so=0xfffff80181bbe800 src=2
Mar 29 01:52:09 citadel kernel: svc_freereq: xprt=0xfffff80a89b73800 xrc=3 so=0xfffff80181bbe800 src=1
*** Here's the problem. src is the socket's reference count (so_count).
    You can see it has dropped from 2 -> 1.

The bad news for me is that the krpc code does nothing to the socket
(no calls of any kind) between the "svc_vc_recv: aft err=0 .." printf() and the
"svc_freereq: xprt.." printf().

So, what drops so_count from 2 -> 1?

The other thing of note is that there is no
"svc_run_internal: xprt=0xfffff802255b0e00 xrc=3 so=0xfffff80184794000 src=2"
line between "svc_vc_recv: aft err=0 .." and "svc_freereq: .." which tells
me that the RPC message was somehow bogus (unless the log is incomplete, but
it looks ok?).

Mar 29 01:52:09 citadel kernel: svc_vc_recv: so=0xfffff80181bbe800 src=1
Mar 29 01:52:09 citadel kernel: svc_vc_recv: aft err=54 so=0xfffff80181bbe800 src=1
*** This is an ECONNRESET which results in the socket being soclose()'d
    and sorele()'d, which crashes because so_count is only 1 and not 2.

Mar 29 01:52:09 citadel kernel: xprt_unregister: xprt=0xfffff80a89b73800 xrc=2 so=0xfffff80181bbe800 src=1
Mar 29 01:52:09 citadel kernel: svc_run_internal2: xprt=0xfffff80a89b73800 xrc=1 so=0xfffff80181bbe800 src=1
Mar 29 01:52:09 citadel kernel: svc_vc_destroy: xprt=0xfffff80a89b73800 xrc=0 so=0xfffff80181bbe800 src=1

Which still leaves us with the mystery of what sorele()'s the socket
to cause so_count to drop from 2 --> 1?
Comment 41 Rick Macklem freebsd_committer freebsd_triage 2026-03-29 20:49:26 UTC
(In reply to Lenore Gilbert from comment #39)
root@<REDACTED>:/home/<REDACTED> # cat /etc/rc.conf
sshd_enable="YES"
ntpd_enable="YES"
powerd_enable="YES"
dumpdev="AUTO"
zfs_enable="YES"
linux_enable="YES"
sendmail_enable="NONE"
#fuse_enable="YES"
named_enable="YES"
ezjail_enable="YES"
ntpdate_flags="pool.ntp.org"
ntpdate_enable="YES"
gateway_enable="YES"
miniupnpd_enable="YES"

dhcpd_enable="YES"
pf_enable="YES"
pflog_enable="YES"
mountd_flags="-r -n"
mountd_enable="YES"
powerd_enable="YES"

*** There is a bunch of stuff here that I would never
    run in an NFS server. With my limited networking
    expertise, I'd say that there are a couple of things
    that might affect the so_count of a socket?
gateway_enable="YES"
    If this is playing with SO_SPLICE, it might be the
    culprit, since SO_SPLICE uses soref()/sorele() a
    bunch.
pf_enable="YES"
    I don't know if the packet filter could mess up a
    socket's so_count?

It would be interesting to see if the other reporters
of crashes have either of these (or any other network
fabric option) in common.
(I'll copy this onto the other PR, to see what they have.)

hostname="<REDACTED>"

ifconfig_ix0="dhcp"

ifconfig_ix1="inet 10.0.0.1 netmask 255.255.255.0"
ifconfig_ix1_alias0="inet <REDACTED> netmask 255.255.255.255"
ifconfig_ix1_alias1="inet <REDACTED> netmask 255.255.255.255"
ifconfig_ix1_alias2="inet <REDACTED> netmask 255.255.255.255"
ifconfig_ix1_alias3="inet <REDACTED> netmask 255.255.255.255"
ifconfig_ix1_alias4="inet <REDACTED> netmask 255.255.255.255"

smartd_enable="YES"
sshd_flags="-o UseBlacklist=yes"
blacklistd_enable="YES"
blacklistd_flags="-r"

#NFS V4 ZFS
nfs_server_enable="YES"
nfsv4_server_enable="YES"
nfsuserd_enable="YES"
Comment 42 Dillon Kass 2026-03-31 15:17:33 UTC
Created attachment 269259 [details]
kernel log with debug output
Comment 43 Dillon Kass 2026-03-31 15:20:42 UTC
I've attached a log with the debug output.

My config is very different. 
No gateway_enable, no pf (but yes ipfw), I am doing nat for a jail, rudimentary nfsd config.

- Do your clients mount/umount frequently?
- Do your clients use an automounter?
  Yes, I have only a few clients. Two Macs using autofs to automount on demand and an apple tv/wife's ipad using the Infuse7 app to mount all of my legally purchased and ripped blurays. I've never had it panic while interacting with a mount it's always when I'm asleep or afk. Not to say the clients aren't doing anything at that time.

I do have multiple interfaces and a public ip on a vlan interface with ipfw blocking the internet from accessing nfs. 

nfs_server_enable="YES"
rpcbind_enable="YES"
rpcbind_flags="-h 192.168.3.2 -h PUBIP"
mountd_enable="YES"
mountd_flags="-l -n -p 851" 

zfs_enable="YES"
sshd_enable="YES"
ntpd_enable="YES"
ntpd_sync_on_start="YES"
local_unbound_enable="YES"
powerd_enable="YES"
moused_nondefault_enable="NO"
blocklistd_enable=yes
blocklistd_flags="-r"
rsyncd_enable="YES"

jail_enable="YES"
jail_list="unifi httpd"
jail_set_hostname_allow="NO"
jail_socket_unixiproute_only="NO"
jail_sysvipc_allow="YES"
jail_parallel_start="YES"

firewall_enable="YES"
firewall_script="/etc/ipfw.conf"

ifconfig_igb0="inet 192.168.3.2 netmask 255.255.255.0"
ifconfig_igb0_alias1="inet 192.168.3.6 netmask 255.255.255.255"

ifconfig_lo0_alias0="inet 127.0.0.2 netmask 255.255.255.255"

cloned_interfaces="vlan0 vlan1"
ifconfig_vlan0="inet <PUBIP> netmask 255.255.255.248 vlan 2 vlandev igb0"
ifconfig_vlan0_alias0="inet <PUBIP2> netmask 255.255.255.255"

ifconfig_vlan1="inet 192.168.9.3 netmask 255.255.255.0 vlan 6 vlandev igb0"
Comment 44 Rick Macklem freebsd_committer freebsd_triage 2026-03-31 22:20:53 UTC
(In reply to Dillon Kass from comment #43)
Your log is a little different, but still seems to
indicate something in the network fabric is sorele()'ng
a socket.

If we look at the last lines before the crash..
Mar 30 23:31:20 ketchupnsketti kernel: svc_run_internal: xprt=0xfffff80001296600 xrc=4 so=0xfffff80047f7c800 src=1
Mar 30 23:31:20 ketchupnsketti kernel: svc_freereq: xprt=0xfffff80001296600 xrc=3 so=0xfffff80047f7c800 svc_run_internal: xprt=0xfffff80001296600 xrc=3 so=0xfffff80047f7c800 src=1
Mar 30 23:31:20 ketchupnsketti kernel: src=1
Mar 30 23:35:46 ketchupnsketti syslogd: kernel boot file is /boot/kernel/kernel
Mar 30 23:35:46 ketchupnsketti kernel: svc_freereq: xprt=0xfffff80001296600 xrc=3 so=0xfffff80047f7c800 src=0
*** Here the socket structure's refcnt (so_count) has dropped to 0, which
    basically means the socket structure has been free'd. (Note that xrc is
    still 3, so the krpc structure is still valid and would not have been
    destroyed.

Mar 30 23:35:46 ketchupnsketti kernel: svc_run_internal: xprt=0xfffff80001296600 xrc=3 so=0xfffff80047f7c800 src=0

In your case, the socket is a datagram/udp one, as seen by the crash in
svc_dg_destroy().

I assume this isn't the whole log. If you have the whole log,
could you please search for other entries for this socket
(so=0xfffff80047f7c800) and see any them have src=0 (or any
value other than 1)?

I didn't put a printf() where datagram/udp sockets are set up,
but that should only happen once, when the NFS server is started.

But it still looks like the others, in that something made src=0 on
the socket structure, causing it to be free'd and it wasn't the krpc
code, since it only calls soclose() and only when svc_dg_destroy()
happens, once the xrc (ref cnt for the xprt structure) goes to 0.
Comment 45 Dillon Kass 2026-04-01 11:38:12 UTC
Only 13 lines so I'll just paste it. No entries other than src=1 until the end. First entry appears 3 hours after boot (Mar 28 10:30)

Mar 28 13:43:37 ketchupnsketti kernel: svc_run_internal: xprt=0xfffff80001296600 xrc=4 so=0xfffff80047f7c800 src=1
Mar 28 13:43:37 ketchupnsketti kernel: svc_freereq: xprt=0xfffff80001296600 xrc=3 so=0xfffff80047f7c800 svc_run_internal: xprt=0xfffff80001296600 xrc=3 so=0xfffff80047f7c800 src=1
Mar 29 12:12:03 ketchupnsketti kernel: svc_run_internal: xprt=0xfffff80001296600 xrc=4 so=0xfffff80047f7c800 src=1
Mar 29 12:12:03 ketchupnsketti kernel: svc_freereq: xprt=0xfffff80001296600 xrc=3 so=0xfffff80047f7c800 svc_run_internal: xprt=0xfffff80001296600 xrc=3 so=0xfffff80047f7c800 src=1
Mar 30 21:18:29 ketchupnsketti kernel: svc_freereq: xprt=0xfffff803d4734e00 xrc=115 so=0xfffff8031fe33c00 svc_run_internal: xprt=0xfffff80001296600 xrc=4 so=0xfffff80047f7c800 src=1
Mar 30 21:18:29 ketchupnsketti kernel: svc_freereq: xprt=0xfffff80001296600 xrc=3 so=0xfffff80047f7c800 svc_freereq: xprt=0xfffff803d4734e00 xrc=114 so=0xfffff8031fe33c00 svc_vc_recv: aft err=0 so=0xfffff8031fe33c00 src=1
Mar 30 21:18:29 ketchupnsketti kernel: svc_run_internal: xprt=0xfffff80001296600 xrc=3 so=0xfffff80047f7c800 src=1
Mar 30 21:18:37 ketchupnsketti kernel: svc_run_internal: xprt=0xfffff80001296600 xrc=4 so=0xfffff80047f7c800 src=1
Mar 30 21:18:37 ketchupnsketti kernel: svc_freereq: xprt=0xfffff80001296600 xrc=3 so=0xfffff80047f7c800 svc_run_internal: xprt=0xfffff80001296600 xrc=3 so=0xfffff80047f7c800 src=1
Mar 30 23:31:20 ketchupnsketti kernel: svc_run_internal: xprt=0xfffff80001296600 xrc=4 so=0xfffff80047f7c800 src=1
Mar 30 23:31:20 ketchupnsketti kernel: svc_freereq: xprt=0xfffff80001296600 xrc=3 so=0xfffff80047f7c800 svc_run_internal: xprt=0xfffff80001296600 xrc=3 so=0xfffff80047f7c800 src=1
Mar 30 23:35:46 ketchupnsketti kernel: svc_freereq: xprt=0xfffff80001296600 xrc=3 so=0xfffff80047f7c800 src=0
Mar 30 23:35:46 ketchupnsketti kernel: svc_run_internal: xprt=0xfffff80001296600 xrc=3 so=0xfffff80047f7c800 src=0
Comment 46 David Gilbert 2026-04-08 19:55:02 UTC
Has this made it into any of the patchlevels?  I have a slightly different but also similar crash.  system is:

FreeBSD vr.home.dclg.ca 15.0-RELEASE-p5 FreeBSD 15.0-RELEASE-p5 releng/15.0-n281018-0730d5233286 GENERIC amd64

I'm going to upload core.txt, but here's the backtrace inline so you don't have to click through.

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
57              __asm("movq %%gs:%c1,%0" : "=r" (td)
(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
        td = <optimized out>
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:399
        error = 0
        coredump = <optimized out>
#2  0xffffffff80b710f9 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:519
        once = 0
#3  0xffffffff80b71607 in vpanic (fmt=0xffffffff811d12d5 "%s", 
    ap=ap@entry=0xfffffe00d04d9830) at /usr/src/sys/kern/kern_shutdown.c:974
        buf = "page fault", '\000' <repeats 245 times>
        __pc = 0x0
        __pc = 0x0
        __pc = 0x0
        other_cpus = {__bits = {18446744069414584319, 0 <repeats 15 times>}}
        td = 0xfffff810c2501780
        bootopt = <unavailable>
        newpanic = <optimized out>
#4  0xffffffff80b71433 in panic (fmt=<unavailable>)
    at /usr/src/sys/kern/kern_shutdown.c:887
        ap = {{gp_offset = 16, fp_offset = 48, 
            overflow_arg_area = 0xfffffe00d04d9860, 
            reg_save_area = 0xfffffe00d04d9800}}
#5  0xffffffff81079f69 in trap_fatal (frame=<optimized out>, 
    eva=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:969
        type = <optimized out>
        handled = <optimized out>
#6  0xffffffff81079f69 in trap_pfault (frame=0xfffffe00d04d98b0, 
    usermode=false, signo=<optimized out>, ucode=<optimized out>)
        __pc = 0x0
        __pc = 0x0
        __pc = 0x0
        td = <optimized out>
        p = <optimized out>
        eva = <optimized out>
        map = <optimized out>
        ftype = <optimized out>
        rv = <optimized out>
#7  <signal handler called>
No locals.
#8  __mtx_lock_sleep (c=c@entry=0xfffff803d371e9f8, v=<optimized out>)
    at /usr/src/sys/kern/kern_mutex.c:614
        __pc = 0x0
        __pc = 0x0
        __pc = 0x0
        __pc = 0x0
        __pc = 0x0
        __pc = 0x0
        __pc = 0x0
        lda = {config = 0xffffffff818000a0 <locks_delay>, delay = 1, 
          spin_cnt = 1}
        sleep_cnt = 0
        sleep_time = 0
        all_time = 0
        doing_lockprof = <optimized out>
        td = 0xfffff810c2501780
        tid = 18446735349596034944
        m = 0xfffff803d371e9e0
        owner = 0x0
        ts = <optimized out>
#9  0xffffffff80e8202d in svc_vc_recv (xprt=0xfffff808a88fe400, 
    msg=0xfffffe00d04d9af0, addrp=0xfffff80f6ea18080, mp=0xfffffe00d04d9ba8)
    at /usr/src/sys/rpc/svc_vc.c:835
        _tid = 18446735349596034944
        _v = 0
        saved_vnet = <optimized out>
        uio = {uio_iov = 0x0, uio_iovcnt = 0, uio_offset = 0, 
          uio_resid = 1000000000, uio_segflg = (unknown: 0xa88fe400), 
          uio_rw = (unknown: 0xfffff808), uio_td = 0xfffff810c2501780}
        m = 0x0
        ctrl = 0x0
        xdrs = {x_op = (unknown: 0xd371e800), x_ops = 0xfffffe00d04d9aa0, 
          x_public = 0xfffff803d56d5800 "", x_private = 0xfffff803d56d5800, 
          x_base = 0x0, x_handy = 3260028800}
        reterr = 4294965256
        xid_plus_direction = {2173916016, 4294967295}
        rcvflag = <optimized out>
        cd = 0xfffff808e7018ee0
        so = 0xfffff803d371e800
        cmsg = <optimized out>
        error = <optimized out>
        ret = <optimized out>
        tgr = <optimized out>
#10 0xffffffff80e7ec22 in svc_getreq (xprt=0xfffff808a88fe400, 
    rqstp_ret=<optimized out>) at /usr/src/sys/rpc/svc.c:935
        msg = {rm_xid = 2679714883, rm_direction = CALL, ru = {RM_cmb = {
              cb_rpcvers = 2, cb_prog = 100000, cb_vers = 4, cb_proc = 0, 
              cb_cred = {oa_flavor = 7, oa_base = 0xfffff80f6ea180a4 "", 
                oa_length = 0}, cb_verf = {oa_flavor = 0, 
                oa_base = 0xfffff80f6ea18234 "", oa_length = 0}}, RM_rmb = {
              rp_stat = (unknown: 0x2), ru = {RP_ar = {ar_verf = {
                    oa_flavor = 4, 
                    oa_base = 0x7 <error: Cannot access memory at address 0x7>, oa_length = 1856077988}, ar_stat = SUCCESS, ru = {AR_versions = {low = 0, 
                      high = 0}, AR_results = {where = 0x0, 
                      proc = 0xfffff80f6ea18234}}}, RP_dr = {
                  rj_stat = (unknown: 0x4), ru = {RJ_versions = {low = 0, 
                      high = 7}, RJ_why = AUTH_OK}}}}}}
        args = 0xfffff8048f40a400
        pool = 0xfffff8022abc6000
        r = 0xfffff80f6ea18000
        stat = <optimized out>
        s = <optimized out>
        _size = <optimized out>
        _malloc_item = <optimized out>
        why = <optimized out>
        repmsg = <optimized out>
        repbody = <optimized out>
        rs = <optimized out>
        optval = <optimized out>
#11 svc_run_internal (grp=grp@entry=0xfffff8022abc6100, 
    ismaster=ismaster@entry=1) at /usr/src/sys/rpc/svc.c:1279
        pool = 0xfffff8022abc6000
        st = 0xfffff80609857b00
        xprt = 0xfffff808a88fe400
        rqstp = 0x0
        stat = <optimized out>
        stpref = <optimized out>
        error = <optimized out>
        sz = <optimized out>
        p = <optimized out>
#12 0xffffffff80e7e677 in svc_run (pool=pool@entry=0xfffff8022abc6000)
    at /usr/src/sys/rpc/svc.c:1408
        td = 0xfffff810c2501780
        p = <optimized out>
        g = 1
        grp = <optimized out>
        i = <optimized out>
#13 0xffffffff80dd63ec in nlm_server_main (addrs=0x26562f844000, 
    addr_count=<optimized out>) at /usr/src/sys/nlm/nlm_prot_impl.c:1652
        portlow = 2
        sin6 = {sin6_len = 28 '\034', sin6_family = 28 '\034', sin6_port = 0, 
          sin6_flowinfo = 0, sin6_addr = {__u6_addr = {
              __u6_addr8 = '\000' <repeats 15 times>, "\001", __u6_addr16 = {
                0, 0, 0, 0, 0, 0, 0, 256}, __u6_addr32 = {0, 0, 0, 
                16777216}}}, sin6_scope_id = 0}
        sin = {sin_len = 128 '\200', sin_family = 23 '\027', sin_port = 49744, 
          sin_addr = {s_addr = 4294965264}, 
          sin_zero = "\000\000\000\000\000\000\000"}
        smstat = {state = 1553}
        timo = {tv_sec = 25, tv_usec = 0}
        opt = {sopt_dir = SOPT_SET, sopt_level = 41, sopt_name = 14, 
          sopt_val = 0xfffffe00d04d9d20, sopt_valsize = 4, sopt_rights = 0x0, 
          sopt_td = 0x0}
        id = {my_name = 0xffffffff8116d6b1 "NFS NLM", my_prog = 0, 
          my_vers = 0, my_proc = 0}
        td = 0xfffff810c2501780
        pool = 0xfffff8022abc6000
        error = <optimized out>
        stat = <optimized out>
        old_nfs_advlock = 0x0
        old_nfs_reclaim = 0x0
        nw = <optimized out>
        host = <optimized out>
        nhost = <optimized out>
        err = <optimized out>
        _v = <optimized out>
        _tid = <optimized out>
        _v = <optimized out>
        _v = <optimized out>
        _tid = <optimized out>
        _v = <optimized out>
#14 sys_nlm_syscall (td=<optimized out>, uap=<optimized out>)
    at /usr/src/sys/nlm/nlm_prot_impl.c:1716
        error = <optimized out>
        saved_vnet = 0x0
#15 0xffffffff8107a8a6 in syscallenter (td=0xfffff810c2501780)
    at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:193
        se = 0xffffffff818bc360 <sysent+4928>
        p = 0xfffffe03c0481018
        sa = <optimized out>
        error = <optimized out>
        sy_thr_static = true
        traced = <optimized out>
        _tid = <optimized out>
        _v = <optimized out>
        _v = <optimized out>
        _audit_entered = <optimized out>
        _tid = <optimized out>
        _v = <optimized out>
        _v = <optimized out>
        _tid = <optimized out>
        _v = <optimized out>
        _v = <optimized out>
#16 amd64_syscall (td=0xfffff810c2501780, traced=0)
    at /usr/src/sys/amd64/amd64/trap.c:1208
        ksi = {ksi_link = {tqe_next = 0xffffffff81079994 <trap+2068>, 
            tqe_prev = 0xfffffe00d04d9ed0}, ksi_info = {
            si_signo = -1034938496, si_errno = -2032, si_code = 0, si_pid = 0, 
            si_uid = 70, si_status = 0, si_addr = 0xfffffe00b3645400, 
            si_value = {sival_int = -1034938496, 
              sival_ptr = 0xfffff810c2501780, sigval_int = -1034938496, 
              sigval_ptr = 0xfffff810c2501780}, _reason = {_fault = {
                _trapno = -800219520}, _timer = {_timerid = -800219520, 
                _overrun = -512}, _mesgq = {_mqd = -800219520}, _poll = {
                _band = -2195528507776}, _capsicum = {_syscall = -800219520}, 
              __spare__ = {__spare1__ = -2195528507776, __spare2__ = {
                  -2135978756, -1, -1069200712, -509, 70, 0, 0}}}}, 
          ksi_flags = 7, ksi_sigq = 0xfffffe00d04d9ec0}
#17 <signal handler called>
No locals.
#18 0x00000a18b23cc54a in ?? ()
No symbol table info available.
Backtrace stopped: Cannot access memory at address 0xa18b118d328
Comment 47 David Gilbert 2026-04-08 19:56:17 UTC
Created attachment 269515 [details]
Core.txt going with dgilbert's post.
Comment 48 yuan.mei 2026-04-09 18:47:55 UTC
Created attachment 269561 [details]
net.patch

(In reply to Rick Macklem from comment #36)

The attached patch seems to stop crashes in both the tcp and udp paths.  Relying primarily on comparing version 15 vs. 14, commit 5bba2728079e stood out as a major TCP shutdown path restructuring.  Some guards were put back, and they appear to have stopped the crash.  UDP needed additional guards in svc_dg as well.

I don't know if this is a proper fix, but for now it works for my system.
Comment 49 Gleb Smirnoff freebsd_committer freebsd_triage 2026-04-09 20:38:37 UTC
I was following this bug, but never was able to find time looking into it. Now that you are pointing at my commit, I feel obliged to join.

The most recent patch seems to change a lot of not really related things. With a very brief look, I would guess it is the extra soref/sorele changes that actually create the right fix. I don't think shutdown(2) changes are relevant.

I will try to find time to dive deep into thing bug next week.
Comment 50 Rick Macklem freebsd_committer freebsd_triage 2026-04-09 22:45:27 UTC
(In reply to yuan.mei from comment #48)
Good work!

I was going to dig into the soref/sorele
changes this weekend, but it sounds like
glebius@ (who'll know the code) is going
to take a look at it.

I agree that I didn't see any evidence
that soshutdown() was causing the crashes.
All the evidence seems to point to a
premauture sorele().

It would be nice to have this fixed for
FreeBSD-15.1.