Created attachment 148728 [details] ASUS P8H61-M LE/USB3 dmesg.boot After almost 3 years of running 9.x on this machine with only manual reboots to install updates I find that I am unable to keep 10.1 running for more than a day. After some trial and error I have found that two simultaneous writes to disk causes a memory issue that ends with a hang once wired memory usage hits 7GB (8GB installed) at this stage the reset button is the only thing that still works. Under light usage this would appear to build up over the period of a day, while forced heavy usage causes a hang within a minute. Running two copies of the attached test.sh while watching top seems normal for a while, then within a minute the wired memory amount can quickly jump from 3GB to 7GB within 6 seconds. Running two instances of the script simultaneously on a USB memstick and the zfs filesystem or both on the zfs filesystem produces the same problem. I have tested this by booting into single user mode, mounting the filesystem and using screen to run top and two scripts at the same time with the same result. Machine is an ASUS P8H61-M LE/USB3 corei5 8GB with 3x 2TB Seagate drives in raidz. I think the pool was created with 9.2 (possibly with 9.1) and the pool and bootcode was updated after installing 10.1 just before RC2 was tagged. FreeBSD leader.local 10.1-RC3 FreeBSD 10.1-RC3 #16 r273471: Thu Oct 23 06:32:33 ACDT 2014 root@leader.local:/usr/obj/usr/src/sys/GENERIC amd64 In single user mode the following modules were loaded - kernel zfs.ko opensolaris.ko geom_eli.ko crypto.ko geom_journal.ko geom_mirror.ko geom_uzip.ko aio.ko coretemp.ko sem.ko smb.ko smbus.ko libiconv.ko libmchain.ko cd9660_iconv.ko ext2fs.ko msdosfs_iconv.ko udf.ko udf_iconv.ko tmpfs.ko nvidia.ko dtraceall.ko profile.ko cyclic.ko dtrace.ko systrace_freebsd32.ko systrace.ko sdt.ko lockstat.ko fasttrap.ko fbt.ko dtnfscl.ko dtmalloc.ko
Created attachment 148729 [details] forced stress script
This is limited to the zpool I have 10.1 installed on. I can boot from 10.1-RC3-amd64-disc1.iso and import the zpool to repeat the issue. I also have a single disk zpool (external usb3 drive) that is a version 28 zpool - I can import this and have no issue. Properties of the zpool are - zrpleader size 5.41T - zrpleader capacity 86% - zrpleader altroot - default zrpleader health ONLINE - zrpleader guid 7653467844531205029 default zrpleader version - default zrpleader bootfs zrpleader local zrpleader delegation on default zrpleader autoreplace off default zrpleader cachefile - default zrpleader failmode wait default zrpleader listsnapshots off default zrpleader autoexpand off default zrpleader dedupditto 0 default zrpleader dedupratio 1.00x - zrpleader free 761G - zrpleader allocated 4.66T - zrpleader readonly off - zrpleader comment - default zrpleader expandsize 0 - zrpleader freeing 0 default zrpleader fragmentation 28% - zrpleader leaked 0 default zrpleader feature@async_destroy enabled local zrpleader feature@empty_bpobj active local zrpleader feature@lz4_compress active local zrpleader feature@multi_vdev_crash_dump enabled local zrpleader feature@spacemap_histogram active local zrpleader feature@enabled_txg active local zrpleader feature@hole_birth active local zrpleader feature@extensible_dataset enabled local zrpleader feature@embedded_data active local zrpleader feature@bookmarks enabled local zrpleader feature@filesystem_limits enabled local
I've now updated to RC4 - while there has been an improvement, the issue is not totally resolved. FreeBSD leader.local 10.1-RC4 FreeBSD 10.1-RC4 #19 r273922: Sat Nov 1 16:36:48 ACDT 2014 root@leader.local:/usr/obj/usr/src/sys/GENERIC amd64 Compression would appear to be a factor. I disabled compression and installed the new world, in single user mode wired memory increased slower and with swap enabled rose to 6.8G and stayed there for an hour uptime. At 26 minutes uptime the wired amount jumped from 3.8G to 6.5G. Without swap enabled processes were terminated after the drop in free ram. With compression enabled the wired amount jumped from 421M to 6.7G in 4 seconds at 3 minutes uptime. Back in multi-user mode I was able to run two copies of the script for several hours. I saw the wired amount rise over 7G a few times and while the system slowed down it remained responsive. Unfortunately the damage was done and the machine was of limited use. I could start simple things like man and ls but ps top and su failed to start. This extended to X apps, I could start an xterm instance but not gnome-terminal, firefox or chrome.. leaving me to hit the reset button.
I think my troubles may be related to zfs and the arc_max setting may play a part. Booting into single user mode and running two instances of my test script I get varying results with different arc_max settings. vfs.zfs.arc_max=2G test hangs about 10 min vfs.zfs.arc_max=2560M tests still running after an hour I have a second machine (Pentium E2140 with 1GB ram) setup with 3 disks in raidz. I have been unable to recreate this issue on this machine. After installing 10.1 didn't re-create the issue I went back to 9.1 and created the zpool, write some test data, upgrade to 9.2, write some data, enable compression, write some data, upgrade to 10.1 and it still didn't break. Either something in my zpool is amiss or the amount of ram makes the difference.
Created attachment 149896 [details] disk writing test Reduced the count values to reduce disk space used during tests.
Updated 10.1-BETA and 10.1-RC versioned bugs to 10.1-STABLE.
I have been running 10-STABLE for a while and after updating to r278305 on 7th Feb I have not seen this issue after two weeks under normal load. While simultaneously running two copies of the sample script shows a slow response to releasing wired ram, it does get released without any issue. Under my normal load I have not noticed this issue.
While the wired accumulates slower the issue is still present.
Can you verify if the issue persists on 10.2-PRERELEASE?
I am currently running 10.2-PRERELEASE #13 r285123 Testing in single user mode I still see wired allocation jump from 1500M to 7300M within a few seconds, but there is still a few hundred MB left free. While I think there could be improvement on the sudden wired allocation I don't see it locking up in this situation.