| Summary: | 10.1-RC3 hangs with simultanious writes | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Base System | Reporter: | Shane <FreeBSD> | ||||||||
| Component: | misc | Assignee: | freebsd-bugs (Nobody) <bugs> | ||||||||
| Status: | Closed Unable to Reproduce | ||||||||||
| Severity: | Affects Only Me | CC: | emaste | ||||||||
| Priority: | --- | ||||||||||
| Version: | 10.1-STABLE | ||||||||||
| Hardware: | amd64 | ||||||||||
| OS: | Any | ||||||||||
| Attachments: |
|
||||||||||
|
Description
Shane
2014-10-28 09:36:51 UTC
Created attachment 148729 [details]
forced stress script
This is limited to the zpool I have 10.1 installed on. I can boot from 10.1-RC3-amd64-disc1.iso and import the zpool to repeat the issue. I also have a single disk zpool (external usb3 drive) that is a version 28 zpool - I can import this and have no issue. Properties of the zpool are - zrpleader size 5.41T - zrpleader capacity 86% - zrpleader altroot - default zrpleader health ONLINE - zrpleader guid 7653467844531205029 default zrpleader version - default zrpleader bootfs zrpleader local zrpleader delegation on default zrpleader autoreplace off default zrpleader cachefile - default zrpleader failmode wait default zrpleader listsnapshots off default zrpleader autoexpand off default zrpleader dedupditto 0 default zrpleader dedupratio 1.00x - zrpleader free 761G - zrpleader allocated 4.66T - zrpleader readonly off - zrpleader comment - default zrpleader expandsize 0 - zrpleader freeing 0 default zrpleader fragmentation 28% - zrpleader leaked 0 default zrpleader feature@async_destroy enabled local zrpleader feature@empty_bpobj active local zrpleader feature@lz4_compress active local zrpleader feature@multi_vdev_crash_dump enabled local zrpleader feature@spacemap_histogram active local zrpleader feature@enabled_txg active local zrpleader feature@hole_birth active local zrpleader feature@extensible_dataset enabled local zrpleader feature@embedded_data active local zrpleader feature@bookmarks enabled local zrpleader feature@filesystem_limits enabled local I've now updated to RC4 - while there has been an improvement, the issue is not totally resolved. FreeBSD leader.local 10.1-RC4 FreeBSD 10.1-RC4 #19 r273922: Sat Nov 1 16:36:48 ACDT 2014 root@leader.local:/usr/obj/usr/src/sys/GENERIC amd64 Compression would appear to be a factor. I disabled compression and installed the new world, in single user mode wired memory increased slower and with swap enabled rose to 6.8G and stayed there for an hour uptime. At 26 minutes uptime the wired amount jumped from 3.8G to 6.5G. Without swap enabled processes were terminated after the drop in free ram. With compression enabled the wired amount jumped from 421M to 6.7G in 4 seconds at 3 minutes uptime. Back in multi-user mode I was able to run two copies of the script for several hours. I saw the wired amount rise over 7G a few times and while the system slowed down it remained responsive. Unfortunately the damage was done and the machine was of limited use. I could start simple things like man and ls but ps top and su failed to start. This extended to X apps, I could start an xterm instance but not gnome-terminal, firefox or chrome.. leaving me to hit the reset button. I think my troubles may be related to zfs and the arc_max setting may play a part. Booting into single user mode and running two instances of my test script I get varying results with different arc_max settings. vfs.zfs.arc_max=2G test hangs about 10 min vfs.zfs.arc_max=2560M tests still running after an hour I have a second machine (Pentium E2140 with 1GB ram) setup with 3 disks in raidz. I have been unable to recreate this issue on this machine. After installing 10.1 didn't re-create the issue I went back to 9.1 and created the zpool, write some test data, upgrade to 9.2, write some data, enable compression, write some data, upgrade to 10.1 and it still didn't break. Either something in my zpool is amiss or the amount of ram makes the difference. Created attachment 149896 [details]
disk writing test
Reduced the count values to reduce disk space used during tests.
Updated 10.1-BETA and 10.1-RC versioned bugs to 10.1-STABLE. I have been running 10-STABLE for a while and after updating to r278305 on 7th Feb I have not seen this issue after two weeks under normal load. While simultaneously running two copies of the sample script shows a slow response to releasing wired ram, it does get released without any issue. Under my normal load I have not noticed this issue. While the wired accumulates slower the issue is still present. Can you verify if the issue persists on 10.2-PRERELEASE? I am currently running 10.2-PRERELEASE #13 r285123 Testing in single user mode I still see wired allocation jump from 1500M to 7300M within a few seconds, but there is still a few hundred MB left free. While I think there could be improvement on the sudden wired allocation I don't see it locking up in this situation. |