Bug 219457 - ZFS ARC eviction & system hangup
Summary: ZFS ARC eviction & system hangup
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.1-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-22 14:52 UTC by Anton Saietskii
Modified: 2017-08-08 22:07 UTC (History)
4 users (show)

See Also:


Attachments
output of "zfs-stats -a" when ARC reaches minimum (10.05 KB, text/plain)
2017-05-22 14:54 UTC, Anton Saietskii
no flags Details
output of "vmstat -z" when ARC reaches minimum (16.88 KB, text/plain)
2017-05-22 14:54 UTC, Anton Saietskii
no flags Details
output of "procstat -kka" when ARC reaches minimum (120.89 KB, text/plain)
2017-05-22 14:56 UTC, Anton Saietskii
no flags Details
output of "truss tar cvf /dev/null /usr/ports" that starts ARC eviction (14.09 KB, text/plain)
2017-05-22 14:57 UTC, Anton Saietskii
no flags Details
output of "zfs-stats -a" when ARC reaches minimum (w/o swap) (9.79 KB, text/plain)
2017-05-26 21:10 UTC, Anton Saietskii
no flags Details
output of "vmstat -z" when ARC reaches minimum (w/o swap) (16.36 KB, text/plain)
2017-05-26 21:10 UTC, Anton Saietskii
no flags Details
output of "procstat -kka" when ARC reaches minimum (w/o swap) (112.32 KB, text/plain)
2017-05-26 21:11 UTC, Anton Saietskii
no flags Details
output of "sysctl vm" when ARC reaches minimum (w/o swap) (6.52 KB, text/plain)
2017-05-26 21:11 UTC, Anton Saietskii
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Anton Saietskii 2017-05-22 14:52:43 UTC
I have a machine with 256 GiB of RAM (249 GiB managed) that serves files over plain HTTP with nginx & AIO.
After system starup ARC grows to its maximum size - ~233 GiB (so about ten gigs are always free), then it slightly drops down to ~228 GiB. Only after that if I start some processes:
1. They will immediately hang in "D" state;
2. pagedaemon/uma enters clearing state;
3. ARC starts to evict till its minimum size;
4. When ARC reaches minimum entire system becomes unresponsive with a delay (from 5 mins to 8 hours).

Some examples of hanging processes:
1. conftest when building devel/m4 (PR in "See Also");
2. tar on any directory, e.g. tar cvf /dev/null /usr/ports.
Comment 1 Anton Saietskii 2017-05-22 14:54:14 UTC
Created attachment 182804 [details]
output of "zfs-stats -a" when ARC reaches minimum
Comment 2 Anton Saietskii 2017-05-22 14:54:50 UTC
Created attachment 182805 [details]
output of "vmstat -z" when ARC reaches minimum
Comment 3 Anton Saietskii 2017-05-22 14:56:00 UTC
Created attachment 182806 [details]
output of "procstat -kka" when ARC reaches minimum
Comment 4 Anton Saietskii 2017-05-22 14:57:01 UTC
Created attachment 182807 [details]
output of "truss tar cvf /dev/null /usr/ports" that starts ARC eviction
Comment 5 Anton Saietskii 2017-05-22 15:07:28 UTC
(In reply to Anton Sayetsky from comment #4)

It's better to duplicate last lines from truss output & procstat output here:

===== truss output =====
clock_gettime(13,{1495402129.000000000 })	 = 0 (0x0)
openat(0xffffff9c,0x80245b0a0,0x100601,0x1b6,0x7fffffffd580,0x801d13b20) = 3 (0x3)
fcntl(3,F_GETFD,)				 = 1 (0x1)
fstat(3,{ mode=crw-rw-rw- ,inode=8,size=0,blksize=4096 }) = 0 (0x0)
openat(0xffffff9c,0x8008bc804,0x100000,0x0,0xffff80080245c7d7,0x0) = 4 (0x4)
fcntl(4,F_GETFD,)				

===== procstar output related to tar =====
75044 101901 bsdtar           -                mi_switch+0xbe sleepq_wait+0x3a _cv_wait+0x14d vmem_xalloc+0x568 vmem_alloc+0x3d kmem_malloc+0x33 uma_large_malloc+0x46 malloc+0x40 fdgrowtable+0x5b fdalloc+0x6c do_dup+0x18f kern_fcntl+0x6dc kern_fcntl_freebsd+0xae amd64_syscall+0x307 Xfast_syscall+0xfb
Comment 6 Fabian Keil 2017-05-23 10:26:44 UTC
The procstat output suggests that you might be using geli for the swap device.

This is known to cause deadlocks under memory pressure:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209759

You could reduce vfs.zfs.deadman_synctime_ms to more quickly get
a panic when the system becomes unresponsive.

It would probably help to see the counters in vm_cnt.
Comment 7 karl 2017-05-24 14:07:09 UTC
There have been a number of changes made to the ZFS code since 10.3-RELEASE; there is a version of a patch that I have been running which *should* apply against 10.3 in the following bug thread (I'm currently on 11 with the version for it in production here):

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594
Comment 8 Anton Saietskii 2017-05-26 21:10:04 UTC
Created attachment 182943 [details]
output of "zfs-stats -a" when ARC reaches minimum (w/o swap)
Comment 9 Anton Saietskii 2017-05-26 21:10:46 UTC
Created attachment 182944 [details]
output of "vmstat -z" when ARC reaches minimum (w/o swap)
Comment 10 Anton Saietskii 2017-05-26 21:11:14 UTC
Created attachment 182945 [details]
output of "procstat -kka" when ARC reaches minimum (w/o swap)
Comment 11 Anton Saietskii 2017-05-26 21:11:44 UTC
Created attachment 182946 [details]
output of "sysctl vm" when ARC reaches minimum (w/o swap)
Comment 12 Anton Saietskii 2017-05-26 21:22:58 UTC
(In reply to Fabian Keil from comment #6)

> The procstat output suggests that you might be using geli for the swap device.
Yes, you're right. I'm using GELI (AES-256-XTS/SHA256/onetime) over gmirror of 2 gpt partitions.

> This is known to cause deadlocks under memory pressure:
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209759
Disabled GELI swap & stopped relevant gmirror -- still got ARC eviction after running tar...

> You could reduce vfs.zfs.deadman_synctime_ms to more quickly get
a panic when the system becomes unresponsive.
Unfortunately, I cannot see any panics (and thus, stack traces). System just hangs w/o any output to logs or console, and all that I can do - reset or power cycle through IPMI interface. I'm thinking about compiling kernel with KDB/DDB and collecting coredump with NMI.

> It would probably help to see the counters in vm_cnt.
Attached relevant sysctl output & similar diagnostics as before, but w/o swap.
Comment 13 Anton Saietskii 2017-05-26 21:24:08 UTC
(In reply to Fabian Keil from comment #6)

> The procstat output suggests that you might be using geli for the swap device.
Yes, you're right. I'm using GELI (AES-256-XTS/SHA256/onetime) over gmirror of 2 gpt partitions.

> This is known to cause deadlocks under memory pressure:
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209759
Disabled GELI swap & stopped relevant gmirror -- still got ARC eviction after running tar...

> You could reduce vfs.zfs.deadman_synctime_ms to more quickly get a panic when the system becomes unresponsive.
Unfortunately, I cannot see any panics (and thus, stack traces). System just hangs w/o any output to logs or console, and all that I can do - reset or power cycle through IPMI interface. I'm thinking about compiling kernel with KDB/DDB and collecting coredump with NMI.

> It would probably help to see the counters in vm_cnt.
Attached relevant sysctl output & similar diagnostics as before, but w/o swap.
Comment 14 Andriy Gapon freebsd_committer freebsd_triage 2017-05-29 06:48:32 UTC
Anton,

I suspect that you could be running into a bug in fdalloc / fdgrowtable code that causes an attempt to allocate an insane amount of memory.  The ARC is just the first victim.

Could you please try to use kgdb (preferably from devel/gdb) and check arguments and local variables in the relevant stack frames?
Comment 15 Anton Saietskii 2017-05-30 16:47:27 UTC
(In reply to Andriy Gapon from comment #14)

Running devel/gdb is possible but I need some instructions because I have almost no experience with it.
Comment 16 Anton Saietskii 2017-08-08 22:07:12 UTC
Looks like I can't reproduce this anymore after updating to releng/11.1
I still can observe a problem with ARC eviction to minimum size, but system at least doesn't hang now.
So its time to try again patches from #187594 & D7538.