Created attachment 148728 [details]
ASUS P8H61-M LE/USB3 dmesg.boot
After almost 3 years of running 9.x on this machine with only manual reboots to install updates I find that I am unable to keep 10.1 running for more than a day.
After some trial and error I have found that two simultaneous writes to disk causes a memory issue that ends with a hang once wired memory usage hits 7GB (8GB installed) at this stage the reset button is the only thing that still works.
Under light usage this would appear to build up over the period of a day, while forced heavy usage causes a hang within a minute. Running two copies of the attached test.sh while watching top seems normal for a while, then within a minute the wired memory amount can quickly jump from 3GB to 7GB within 6 seconds. Running two instances of the script simultaneously on a USB memstick and the zfs filesystem or both on the zfs filesystem produces the same problem.
I have tested this by booting into single user mode, mounting the filesystem and using screen to run top and two scripts at the same time with the same result.
Machine is an ASUS P8H61-M LE/USB3 corei5 8GB with 3x 2TB Seagate drives in raidz. I think the pool was created with 9.2 (possibly with 9.1) and the pool and bootcode was updated after installing 10.1 just before RC2 was tagged.
FreeBSD leader.local 10.1-RC3 FreeBSD 10.1-RC3 #16 r273471: Thu Oct 23 06:32:33 ACDT 2014 email@example.com:/usr/obj/usr/src/sys/GENERIC amd64
In single user mode the following modules were loaded -
Created attachment 148729 [details]
forced stress script
This is limited to the zpool I have 10.1 installed on. I can boot from 10.1-RC3-amd64-disc1.iso and import the zpool to repeat the issue.
I also have a single disk zpool (external usb3 drive) that is a version 28 zpool - I can import this and have no issue.
Properties of the zpool are -
zrpleader size 5.41T -
zrpleader capacity 86% -
zrpleader altroot - default
zrpleader health ONLINE -
zrpleader guid 7653467844531205029 default
zrpleader version - default
zrpleader bootfs zrpleader local
zrpleader delegation on default
zrpleader autoreplace off default
zrpleader cachefile - default
zrpleader failmode wait default
zrpleader listsnapshots off default
zrpleader autoexpand off default
zrpleader dedupditto 0 default
zrpleader dedupratio 1.00x -
zrpleader free 761G -
zrpleader allocated 4.66T -
zrpleader readonly off -
zrpleader comment - default
zrpleader expandsize 0 -
zrpleader freeing 0 default
zrpleader fragmentation 28% -
zrpleader leaked 0 default
zrpleader feature@async_destroy enabled local
zrpleader feature@empty_bpobj active local
zrpleader feature@lz4_compress active local
zrpleader feature@multi_vdev_crash_dump enabled local
zrpleader feature@spacemap_histogram active local
zrpleader feature@enabled_txg active local
zrpleader feature@hole_birth active local
zrpleader feature@extensible_dataset enabled local
zrpleader feature@embedded_data active local
zrpleader feature@bookmarks enabled local
zrpleader feature@filesystem_limits enabled local
I've now updated to RC4 - while there has been an improvement, the issue is not totally resolved.
FreeBSD leader.local 10.1-RC4 FreeBSD 10.1-RC4 #19 r273922: Sat Nov 1 16:36:48 ACDT 2014 firstname.lastname@example.org:/usr/obj/usr/src/sys/GENERIC amd64
Compression would appear to be a factor. I disabled compression and installed the new world, in single user mode wired memory increased slower and with swap enabled rose to 6.8G and stayed there for an hour uptime. At 26 minutes uptime the wired amount jumped from 3.8G to 6.5G. Without swap enabled processes were terminated after the drop in free ram.
With compression enabled the wired amount jumped from 421M to 6.7G in 4 seconds at 3 minutes uptime.
Back in multi-user mode I was able to run two copies of the script for several hours. I saw the wired amount rise over 7G a few times and while the system slowed down it remained responsive. Unfortunately the damage was done and the machine was of limited use. I could start simple things like man and ls but ps top and su failed to start. This extended to X apps, I could start an xterm instance but not gnome-terminal, firefox or chrome.. leaving me to hit the reset button.
I think my troubles may be related to zfs and the arc_max setting may play a part.
Booting into single user mode and running two instances of my test script I get varying results with different arc_max settings.
vfs.zfs.arc_max=2G test hangs about 10 min
vfs.zfs.arc_max=2560M tests still running after an hour
I have a second machine (Pentium E2140 with 1GB ram) setup with 3 disks in raidz. I have been unable to recreate this issue on this machine. After installing 10.1 didn't re-create the issue I went back to 9.1 and created the zpool, write some test data, upgrade to 9.2, write some data, enable compression, write some data, upgrade to 10.1 and it still didn't break. Either something in my zpool is amiss or the amount of ram makes the difference.
Created attachment 149896 [details]
disk writing test
Reduced the count values to reduce disk space used during tests.
Updated 10.1-BETA and 10.1-RC versioned bugs to 10.1-STABLE.
I have been running 10-STABLE for a while and after updating to r278305 on 7th Feb I have not seen this issue after two weeks under normal load.
While simultaneously running two copies of the sample script shows a slow response to releasing wired ram, it does get released without any issue. Under my normal load I have not noticed this issue.
While the wired accumulates slower the issue is still present.
Can you verify if the issue persists on 10.2-PRERELEASE?
I am currently running 10.2-PRERELEASE #13 r285123
Testing in single user mode I still see wired allocation jump from 1500M to 7300M within a few seconds, but there is still a few hundred MB left free.
While I think there could be improvement on the sudden wired allocation I don't see it locking up in this situation.