Bug 260664 - ZFS/NFS: Intermittent hangs and crashes after a period of time in: nfscl_hasexpired || dbuf_write_done || zio_execute
Summary: ZFS/NFS: Intermittent hangs and crashes after a period of time in: nfscl_hase...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-fs (Nobody)
URL:
Keywords: crash, needs-qa
Depends on:
Blocks:
 
Reported: 2021-12-24 14:41 UTC by Pader
Modified: 2022-05-20 22:24 UTC (History)
3 users (show)

See Also:
koobs: maintainer-feedback? (fs)
koobs: maintainer-feedback? (rmacklem)


Attachments
dmesg.boot file (43.75 KB, text/plain)
2021-12-27 02:04 UTC, Pader
no flags Details
pciconf output (4.73 KB, text/plain)
2021-12-27 02:06 UTC, Pader
no flags Details
ps axHl (57.33 KB, text/plain)
2021-12-27 02:48 UTC, Pader
no flags Details
procstat -a -kk (99.56 KB, text/plain)
2021-12-27 02:51 UTC, Pader
no flags Details
netstat -a (16.71 KB, text/plain)
2021-12-27 03:33 UTC, Pader
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Pader 2021-12-24 14:41:54 UTC
Hello everyone, I have a server running on FreeBSD 13.0, the server randomly freeze after some days or a month.

Here is the phenomenon:
1. Unable to connect to the ssh, when input ssh command, no any response.
2. Alot of services can not be visit, some simple service like static nginx page can be opened in a short time, but if you refreshed page some times, the page will be stuck, and have no response, some other services is the same.
3. Another server has a always logged ssh to FreeBSD Server, and opened a top command, when FreeBSD freeze, this ssh can still visit and top command can refresh and output system status,the memory is normal, cpu usage is normal, ZFS ARC is normal, swap is normal, clock is normal, looks like anything is normal. Any hot key for top can use, but when press q to quit top, and type other command, like "systat -ifstat", the command stuck, no any output, Ctrl + Z or C no response.
4. Ping server always normal.
5. The redis-server on freebsd is normal, because redis service can response and very good.
6. Unable to login from console, when type username and password, press enter, no any output.

Environment:
FreeBSD 13.0,Intel Xeon 4Core + 16GB Memory, Two 2T Disk, ZFS Mirror, Root on ZFS. It's a new machine, it's been less than half a year since we bought it.
Main system only running sshguard+ipfw, mount a nfs and use nullfs to a jail, jail file system running on zfs dataset clone, services all running in this jail.

Server has two bge network interface, one for lan, one for wan, the services is network heavy service.
In jail, running nginx, php-fpm, php cli server, mysql, redis-server, there is alot of nfs write, read by php.

Some try:
At first it was suspected to be a ZFS ARC problem, and I set arc max to 2G, but in top ARC is very normal..
When look at dmesg, or any log by system or services, every log stopped record when system freeze, means there is no any abnormal log.. but looks like some service that no need read or write file is normal.

Some try 2:
Before configure kern.ipc.somaxconn, when system hang, I can't login system, can't do any operate. But after change somaxconn, when system hang (worker processes freeze), I can still login to system, and do some operate that not touch the NFS mountpoint, I found that freeze because of nfs mount is dead.

I found some same problems:
https://emby.media/community/index.php?/topic/74175-freebsd-jail-with-nfsv4-share-causes-system-to-hang/
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=251347
https://redmine.ixsystems.com/issues/2068

Some try 3:
I have move my program that read/write nfs out of jail, add intr to nfs mount options, more system crash happened, and every crash stack trace is different, sometimes is arc_write, or nfscl, sometime is zio_execute, I now suspect that the possibility of hardware failure is quite high.


- uname -a

FreeBSD ppbsd 13.0-RELEASE-p4 FreeBSD 13.0-RELEASE-p4 #0: Tue Aug 24 07:33:27 UTC 2021     root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64

- loader.conf

kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"
cryptodev_load="YES"
zfs_load="YES"
coretemp_load="YES"
net.inet.ip.fw.default_to_accept=1
vfs.zfs.arc_max="2G"

# Increase dmesg buffer to fit longer boot output.
kern.msgbufsize="524288"

- sysctl.conf

# $FreeBSD$
#
#  This file is read when going to multi-user and its contents piped thru
#  ``sysctl'' to adjust kernel values.  ``man 5 sysctl.conf'' for details.
#

# Uncomment this to prevent users from seeing information about processes that
# are being run under another UID.
#security.bsd.see_other_uids=0
#vfs.zfs.min_auto_ashift=12
kern.ipc.somaxconn=4096

Here is the crash log 1:

Dec 17 00:47:02 ppbsd kernel: Fatal trap 12: page fault while in kernel mode
Dec 17 00:47:02 ppbsd kernel: cpuid = 2; apic id = 04
Dec 17 00:47:02 ppbsd kernel: fault virtual address     = 0x28
Dec 17 00:47:02 ppbsd kernel: fault code                = supervisor read data, page not present
Dec 17 00:47:02 ppbsd kernel: instruction pointer       = 0x20:0xffffffff821495f8
Dec 17 00:47:02 ppbsd kernel: stack pointer             = 0x0:0xfffffe010fae48d0
Dec 17 00:47:02 ppbsd kernel: frame pointer             = 0x0:0xfffffe010fae48d0
Dec 17 00:47:02 ppbsd kernel: code segment              = base rx0, limit 0xfffff, type 0x1b
Dec 17 00:47:02 ppbsd kernel:                   = DPL 0, pres 1, long 1, def32 0, gran 1
Dec 17 00:47:02 ppbsd kernel: processor eflags  = interrupt enabled, resume, IOPL = 0
Dec 17 00:47:02 ppbsd kernel: current process           = 0 (z_wr_int_3)
Dec 17 00:47:02 ppbsd kernel: trap number               = 12
Dec 17 00:47:02 ppbsd kernel: panic: page fault
Dec 17 00:47:02 ppbsd kernel: cpuid = 2
Dec 17 00:47:02 ppbsd kernel: time = 1639673010
Dec 17 00:47:02 ppbsd kernel: KDB: stack backtrace:
Dec 17 00:47:02 ppbsd kernel: #0 0xffffffff80c574c5 at kdb_backtrace+0x65
Dec 17 00:47:02 ppbsd kernel: #1 0xffffffff80c09ea1 at vpanic+0x181
Dec 17 00:47:02 ppbsd kernel: #2 0xffffffff80c09d13 at panic+0x43
Dec 17 00:47:02 ppbsd kernel: #3 0xffffffff8108b1b7 at trap_fatal+0x387
Dec 17 00:47:02 ppbsd kernel: #4 0xffffffff8108b20f at trap_pfault+0x4f
Dec 17 00:47:02 ppbsd kernel: #5 0xffffffff8108a86d at trap+0x27d
Dec 17 00:47:02 ppbsd kernel: #6 0xffffffff81061958 at calltrap+0x8
Dec 17 00:47:02 ppbsd kernel: #7 0xffffffff821a4d3e at dbuf_write_done+0x9e
Dec 17 00:47:02 ppbsd kernel: #8 0xffffffff82190c5c at arc_write_done+0x33c
Dec 17 00:47:02 ppbsd kernel: #9 0xffffffff822f920d at zio_done+0xd9d
Dec 17 00:47:02 ppbsd kernel: #10 0xffffffff822f2d5c at zio_execute+0x3c
Dec 17 00:47:02 ppbsd kernel: #11 0xffffffff80c6b161 at taskqueue_run_locked+0x181
Dec 17 00:47:02 ppbsd kernel: #12 0xffffffff80c6c47c at taskqueue_thread_loop+0xac
Dec 17 00:47:02 ppbsd kernel: #13 0xffffffff80bc7dde at fork_exit+0x7e
Dec 17 00:47:02 ppbsd kernel: #14 0xffffffff810629de at fork_trampoline+0xe
Dec 17 00:47:02 ppbsd kernel: Uptime: 1h44m58s
Dec 17 00:47:02 ppbsd kernel: Dumping 2511 out of 16190 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%---<<BOOT>>---

Crash log 2:

Dec 21 14:09:50 ppbsd syslogd: kernel boot file is /boot/kernel/kernel
Dec 21 14:09:50 ppbsd kernel: 
Dec 21 14:09:50 ppbsd syslogd: last message repeated 1 times
Dec 21 14:09:50 ppbsd kernel: Fatal trap 12: page fault while in kernel mode
Dec 21 14:09:50 ppbsd kernel: cpuid = 2; apic id = 04
Dec 21 14:09:50 ppbsd kernel: fault virtual address	= 0x0
Dec 21 14:09:50 ppbsd kernel: fault code		= supervisor write data, page not present
Dec 21 14:09:50 ppbsd kernel: instruction pointer	= 0x20:0xffffffff80ac9e26
Dec 21 14:09:50 ppbsd kernel: stack pointer	        = 0x28:0xfffffe011bf165b0
Dec 21 14:09:50 ppbsd kernel: frame pointer	        = 0x28:0xfffffe011bf165f0
Dec 21 14:09:50 ppbsd kernel: code segment		= base rx0, limit 0xfffff, type 0x1b
Dec 21 14:09:50 ppbsd kernel: 			= DPL 0, pres 1, long 1, def32 0, gran 1
Dec 21 14:09:50 ppbsd kernel: processor eflags	= interrupt enabled, resume, IOPL = 0
Dec 21 14:09:50 ppbsd kernel: current process		= 4541 (newnfs 3)
Dec 21 14:09:50 ppbsd kernel: trap number		= 12
Dec 21 14:09:50 ppbsd kernel: panic: page fault
Dec 21 14:09:50 ppbsd kernel: cpuid = 2
Dec 21 14:09:50 ppbsd kernel: time = 1640066765
Dec 21 14:09:50 ppbsd kernel: KDB: stack backtrace:
Dec 21 14:09:50 ppbsd kernel: #0 0xffffffff80c574c5 at kdb_backtrace+0x65
Dec 21 14:09:50 ppbsd kernel: #1 0xffffffff80c09ea1 at vpanic+0x181
Dec 21 14:09:50 ppbsd kernel: #2 0xffffffff80c09d13 at panic+0x43
Dec 21 14:09:50 ppbsd kernel: #3 0xffffffff8108b1b7 at trap_fatal+0x387
Dec 21 14:09:50 ppbsd kernel: #4 0xffffffff8108b20f at trap_pfault+0x4f
Dec 21 14:09:50 ppbsd kernel: #5 0xffffffff8108a86d at trap+0x27d
Dec 21 14:09:50 ppbsd kernel: #6 0xffffffff81061958 at calltrap+0x8
Dec 21 14:09:50 ppbsd kernel: #7 0xffffffff80acc5d9 at nfscl_hasexpired+0x709
Dec 21 14:09:50 ppbsd kernel: #8 0xffffffff80add066 at nfsrpc_read+0x316
Dec 21 14:09:50 ppbsd kernel: #9 0xffffffff80aee349 at ncl_readrpc+0x89
Dec 21 14:09:50 ppbsd kernel: #10 0xffffffff80b01443 at ncl_doio+0xe3
Dec 21 14:09:50 ppbsd kernel: #11 0xffffffff80b03b32 at nfssvc_iod+0x232
Dec 21 14:09:50 ppbsd kernel: #12 0xffffffff80bc7dde at fork_exit+0x7e
Dec 21 14:09:50 ppbsd kernel: #13 0xffffffff810629de at fork_trampoline+0xe
Dec 21 14:09:50 ppbsd kernel: Uptime: 2d17h24m52s
Dec 21 14:09:50 ppbsd kernel: Dumping 3243 out of 16190 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%---<<BOOT>>---

Crash log 3:

Dec 22 16:30:19 ppbsd syslogd: kernel boot file is /boot/kernel/kernel
Dec 22 16:30:19 ppbsd kernel: 
Dec 22 16:30:19 ppbsd syslogd: last message repeated 1 times
Dec 22 16:30:19 ppbsd kernel: Fatal trap 12: page fault while in kernel mode
Dec 22 16:30:19 ppbsd kernel: cpuid = 2; apic id = 04
Dec 22 16:30:19 ppbsd kernel: fault virtual address	= 0x2000
Dec 22 16:30:19 ppbsd kernel: fault code		= supervisor read instruction, page not present
Dec 22 16:30:19 ppbsd kernel: instruction pointer	= 0x20:0x2000
Dec 22 16:30:19 ppbsd kernel: stack pointer	        = 0x28:0xfffffe010f9a4998
Dec 22 16:30:19 ppbsd kernel: frame pointer	        = 0x28:0xfffffe010f9a49d0
Dec 22 16:30:19 ppbsd kernel: code segment		= base rx0, limit 0xfffff, type 0x1b
Dec 22 16:30:19 ppbsd kernel: 			= DPL 0, pres 1, long 1, def32 0, gran 1
Dec 22 16:30:19 ppbsd kernel: processor eflags	= interrupt enabled, resume, IOPL = 0
Dec 22 16:30:19 ppbsd kernel: current process		= 0 (z_wr_int_2)
Dec 22 16:30:19 ppbsd kernel: trap number		= 12
Dec 22 16:30:19 ppbsd kernel: panic: page fault
Dec 22 16:30:19 ppbsd kernel: cpuid = 2
Dec 22 16:30:19 ppbsd kernel: time = 1640161595
Dec 22 16:30:19 ppbsd kernel: KDB: stack backtrace:
Dec 22 16:30:19 ppbsd kernel: #0 0xffffffff80c574c5 at kdb_backtrace+0x65
Dec 22 16:30:19 ppbsd kernel: #1 0xffffffff80c09ea1 at vpanic+0x181
Dec 22 16:30:19 ppbsd kernel: #2 0xffffffff80c09d13 at panic+0x43
Dec 22 16:30:19 ppbsd kernel: #3 0xffffffff8108b1b7 at trap_fatal+0x387
Dec 22 16:30:19 ppbsd kernel: #4 0xffffffff8108b20f at trap_pfault+0x4f
Dec 22 16:30:19 ppbsd kernel: #5 0xffffffff8108a86d at trap+0x27d
Dec 22 16:30:19 ppbsd kernel: #6 0xffffffff81061958 at calltrap+0x8
Dec 22 16:30:19 ppbsd kernel: #7 0xffffffff822e2d5c at zio_execute+0x3c
Dec 22 16:30:19 ppbsd kernel: #8 0xffffffff80c6b161 at taskqueue_run_locked+0x181
Dec 22 16:30:19 ppbsd kernel: #9 0xffffffff80c6c47c at taskqueue_thread_loop+0xac
Dec 22 16:30:19 ppbsd kernel: #10 0xffffffff80bc7dde at fork_exit+0x7e
Dec 22 16:30:19 ppbsd kernel: #11 0xffffffff810629de at fork_trampoline+0xe
Dec 22 16:30:19 ppbsd kernel: Uptime: 23h2m59s
Dec 22 16:30:19 ppbsd kernel: Dumping 3138 out of 16190 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%---<<BOOT>>---
Comment 1 Kubilay Kocak freebsd_committer freebsd_triage 2021-12-24 22:35:13 UTC
@Reporter Please include additional information:

- /var/run/dmesg.boot (as an attachment)
- pciconf -lv output (as an attachment)
Comment 2 Rick Macklem freebsd_committer freebsd_triage 2021-12-25 00:27:24 UTC
I'll make a few random comments.
1 - Since the crashes occur at different places, I agree with
    you that it might be a hardware problem.
    It would be nice if you could set up another machine with
    exactly the same software/usage and see if it crashes as well.

2 - You should never use "intr" nor "soft" on NFSv4 mounts. This is
    mentioned in the BUGS section at the end of "man mount_nfs".
    To get rid of a hung NFS mount, use "umount -N <mnt_path>.
    (It can take a couple of minutes, but normally succeeds. Note
     that any file writing that was in progress when you do this
     will get lost.)

3 - If you can still log in and do these when "hung", capture the
    output of:
    # ps axHl
    # procstat -a -kk
    # netstat -a
    # ping <nfs-server>

4 - nfscl_hasexpired() only gets called when the NFS client does not
    get a response from the server for minutes and then gets a
    NFSERR_EXPIRED reply from the server.

You did not mention what NFS server you are using. If it is a FreeBSD 13.0
server, then see PR#256280.

If you get the output from #3, please post it here.
Comment 3 Pader 2021-12-27 02:04:44 UTC
Created attachment 230434 [details]
dmesg.boot file
Comment 4 Pader 2021-12-27 02:06:36 UTC
Created attachment 230435 [details]
pciconf output
Comment 5 Pader 2021-12-27 02:13:39 UTC
(In reply to Kubilay Kocak from comment #1)
Thank, I have add the attachments of those two files.
Comment 6 Pader 2021-12-27 02:34:43 UTC
(In reply to Rick Macklem from comment #2)
NFS Server is two linux servers, one is an old serve running almost 10 years, another is a new server we set recently.

I think the NFS servers are normal because of there is about 6 linux servers mount these NFS servers and running same program, and they did not freeze or crash.

It just so happened that the system crashed today, I will add some output file attachments as you said.
Comment 7 Pader 2021-12-27 02:48:45 UTC
Created attachment 230436 [details]
ps axHl
Comment 8 Pader 2021-12-27 02:51:04 UTC
Created attachment 230437 [details]
procstat -a -kk
Comment 9 Pader 2021-12-27 03:33:12 UTC
Created attachment 230438 [details]
netstat -a
Comment 10 Rick Macklem freebsd_committer freebsd_triage 2021-12-27 15:57:07 UTC
Would it be possible for you to upgrade your kernel from
the sources for stable/13?

There is a patch in stable/13 that fixed a problem where
open/create would defer to an exclusive lock request and
cause a deadlock.
It is commit 701eb03cc0dc in stable/13, but I do not know
if it will apply to 13.0 kernel sources cleanly?

I thought this could only occur when delegations were being
issued by the server and, since you are not running nfscbd(8),
delegations should never be issued to the client, even for
non-FreeBSD servers.

You never mentioned what NFS server you are using. If it is
a FreeBSD one, make sure delegations are not enabled.
The sysctl vfs.nfsd.enable_delegations should be set to 0
on the server.
Comment 11 Rick Macklem freebsd_committer freebsd_triage 2021-12-27 17:20:04 UTC
Ok, it does look like the commit in stable/13 would fix this hang.
It happens because read calls nfscl_hasexpired(), which tries to
acquire the exclusive lock (similar to delegation return cases).

However, calling nfscl_hasexpired() should *almost never* happen.
It happens when the client has been partitioned from the NFSv4
server for at least a minute.
For the FreeBSD NFSv4 server (which is what is called a courteous
server), the expired only happens when a conflicting open/lock
request is done by another client or when open/lock resources
become exhausted.
When recovery from "expired" is done, all byte range locks are lost,
so getting the client/server into this state is to be avoiding if
at all possible.

If your NFSv4 server is a Linux one, expired will happen when the
client is network partitioned from the server for over 60sec.

In other words, I think you have some sort of network connectivity
problem to the NFS server.

As an alternative to upgrading to stable/13, you could switch to
using NFSv3 mounts to avoid the hang.
Comment 12 Rick Macklem freebsd_committer freebsd_triage 2021-12-28 04:33:18 UTC
Oh, and try disabling TSO. It's often broken for
various net drivers/chips.
Comment 13 Pader 2021-12-28 07:47:35 UTC
(In reply to Rick Macklem from comment #11)
Thank you, I 'm now trying mount nfs as nfsv3.

One of server mount as nfsv3 is ok.
Another one report:

RPCPROG_NFS: RPC: Program not registered

Do showmount to that server:

showmount -e 192.168.1.2
RPC: Program not registered
showmount: can't do exports rpc

However, this nfs is not strongly dependent, so I don't need to mount it for now, continue to observe for a while.
Comment 14 Rick Macklem freebsd_committer freebsd_triage 2021-12-28 15:30:50 UTC
If you are going to do an NFSv3 mount, the NFS server
must be configured for that. I can't comment further,
since I do not know what kind of NFS server you are using.

For the client, rpcbind must be running for NFSv3 mounts.
Comment 15 Pader 2022-01-24 07:29:04 UTC
(In reply to Rick Macklem from comment #10)

It's been 28 days ago after I switch to NFSv3, everything's normal.

Server is very stable.

I think that is the problem.
Comment 16 Pader 2022-02-15 03:54:54 UTC
It's been almost 49 days ago with NFSv3, everything's normal.
Comment 17 Rick Macklem freebsd_committer freebsd_triage 2022-05-20 22:24:35 UTC
Since I believe this is fixed in the 13.1 release, close the PR.