it happens generally while using gimp which crashes in a sudden disappearing from screen, then for some time other apps were open and they vanish too until the screen is black, no panel nothing but my konsole which I always have in the corner, so I can do "shutdown now" and it works but hangs on a stalled block cursor on first character top left thing here is I have enough memory, I have enough swap space which is barely used and also Gimp's gegl cache file which has no limite is in moderate use between 300MB up to 1GB here comes what I get in messages, sometimes there are more, sometimes less programms Jul 19 07:43:27 hm-fbsd kernel: pid 1551 (chrome), jid 0, uid 1002, was killed: out of swap space Jul 19 07:43:27 hm-fbsd kernel: pid 1685 (gimp-2.10), jid 0, uid 1002, was killed: out of swap space Jul 19 07:43:27 hm-fbsd kernel: pid 1546 (chrome), jid 0, uid 1002, was killed: out of swap space Jul 19 07:43:27 hm-fbsd kernel: pid 1574 (chrome), jid 0, uid 1002, was killed: out of swap space Jul 19 07:43:27 hm-fbsd kernel: pid 1283 (plasmashell), jid 0, uid 1002, was killed: out of swap space Jul 19 07:43:27 hm-fbsd kernel: pid 1548 (chrome), jid 0, uid 1002, was killed: out of swap space Jul 19 07:43:27 hm-fbsd kernel: pid 1786 (chrome), jid 0, uid 1002, was killed: out of swap space Jul 19 07:43:27 hm-fbsd kernel: pid 1193 (Xorg), jid 0, uid 0, was killed: out of swap space Jul 19 07:43:33 hm-fbsd console-kit-daemon[1189]: GLib-GObject-WARNING: instance of invalid non-instantiatable type '(null)' Jul 19 07:43:33 hm-fbsd console-kit-daemon[1189]: GLib-GObject-CRITICAL: g_signal_handlers_disconnect_matched: assertion 'G_TYPE_CHECK_INSTANCE (instance)' failed Jul 19 07:43:33 hm-fbsd console-kit-daemon[1189]: GLib-GObject-WARNING: instance of invalid non-instantiatable type '(null)' Jul 19 07:43:33 hm-fbsd console-kit-daemon[1189]: GLib-GObject-CRITICAL: g_signal_handlers_disconnect_matched: assertion 'G_TYPE_CHECK_INSTANCE (instance)' failed Jul 19 07:43:33 hm-fbsd console-kit-daemon[1189]: GLib-GObject-WARNING: instance of invalid non-instantiatable type '(null)' Jul 19 07:43:33 hm-fbsd console-kit-daemon[1189]: GLib-GObject-CRITICAL: g_signal_handlers_disconnect_matched: assertion 'G_TYPE_CHECK_INSTANCE (instance)' failed Jul 19 07:43:41 hm-fbsd devd[663]: check_clients: dropping disconnected client Jul 19 07:43:41 hm-fbsd syslogd: last message repeated 1 times some irrelevant lines later this: Jul 19 07:44:04 hm-fbsd dbus-daemon[1122]: [system] Rejected send message, 2 matched rules; type="method_call", sender=":1.100" (uid=1002 pid=1904 comm="/us r/local/lib/libexec/org_kde_powerdevil ") interface="org.freedesktop.ConsoleKit.Manager" member="CanSuspendThenHibernate" error name="(unset)" requested_rep ly="0" destination="org.freedesktop.ConsoleKit" (uid=0 pid=1189 comm="/usr/local/sbin/console-kit-daemon --no-daemon ") Jul 19 07:47:23 hm-fbsd chrome[2001]: [2001:101263:0719/074723.261261:ERROR:browser_dm_token_storage_linux.cc(93)] Error: /etc/machine-id contains 0 charact ers (32 were expected). sudo swapinfo -h Device Size Used Avail Capacity /dev/md99 8.0G 0B 8.0G 0% ll /zroot/gimpswap/ -rwxr-x--- 1 hm wheel 3298597664 21 jul 10:59 gegl-swap-3266-0-shared* the gimpswap is not very big as it seems, I work on 24 megapixel images in 32bit mode I can repeat the issue easily by repeating a memory intensive function 3 or 4 times and bang, means if you could tell me a better debug method I can do it quite fast on the same machine I have a linux disk with the same setup and never had any problem this disk is a factory zfs root install freeBSD 13 and amd cpu GIMP has a setting to use or not to use hardware acceleration and it doesn't matter
Killings are to be expected when swap space is exhausted. Need to tell what causes exhaustion in your case. (In reply to Michael from comment #0) > Jul 19 07:43:27 hm-fbsd kernel: pid 1551 (chrome), jid 0, uid 1002, > was killed: out of swap space If the first killing was of Chromium, then give thought to what might cause Chromium to use too much memory. Monitor usage. Utilities such as these might help: sysutils/htop sysutils/gkrellm2 – I currently use the twilite theme, YMMV (see <>).
(In reply to Graham Perrin from comment #1) yes I know, but it is saying out of swap space, well we can be suspicious that memory was out, but wasn't and swap also not for monitor purposes I have the terminal(konsole) always open, specially since this crashes ocurre I believe that Gimp is the first caller, because it happened so far only when I was working with gimp, then I believe plasma desktop goes away and with it everything what is open ... this both are my primary suspects but still it makes no sense because there is not much swap use at all and always free memory available ... that's the point which I need to find
(In reply to Graham Perrin from comment #1) Various "out of swap space" kill messages are misnomers, unfortunately. Only if there were also the following sorts of messages are the indications correct: swap_pager: out of swap space swp_pager_getswapspace(3): failed Other causes of "out of swap space" kill messages include: Sustained low free RAM (via 1 more more stays-runnable processes). A sufficiently delayed pageout. The swap blk uma zone was exhausted. The swap pctrie uma zone was exhausted. The first two have loader tunable's that make the system tolerate conditions for longer, potentially much longer before starting kills. (The values shown are generally only examples that were sufficient for some specific context.) I recommend trying the following sorts of thing in, say, /boot/loader.conf and booting with the settings used: # # Delay when persistent low free RAM leads to # Out Of Memory killing of processes: vm.pageout_oom_seq=120 # # For plunty of swap/paging space (will not # run out), avoid pageout delays leading to # Out Of Memory killing of processes: vm.pfault_oom_attempts=-1 # # For possibly insufficient swap/paging space # (might run out), increase the pageout delay # that leads to Out Of Memory killing of # processes (showing defaults at the time): #vm.pfault_oom_attempts= 3 #vm.pfault_oom_wait= 10 # (The multiplication is the total but there # are other potential tradoffs in the factors # multiplied, even for nearly the same total.) I'm not aware of anything to adjust for either of: The swap blk uma zone was exhausted. The swap pctrie uma zone was exhausted.
(In reply to Mark Millard from comment #3) By the way, for reference: # sysctl -d vm.pageout_oom_seq vm.pageout_oom_seq: back-to-back calls to oom detector to start OOM (As I remember teh default is 12 for the above.) # sysctl -d vm.pfault_oom_attempts vm.pfault_oom_attempts: Number of page allocation attempts in page fault handler before it triggers OOM handling (-1 for the above disables the pfault OOM handling.) # sysctl -d vm.pfault_oom_wait vm.pfault_oom_wait: Number of seconds to wait for free pages before retrying the page fault handler All 3 of those show up under both sysctl -T and sysctl -W .
My apologies, (In reply to Mark Millard from comment #3) > … Various "out of swap space" kill messages are misnomers, unfortunately. … – I was previously unaware of this. > Only if there were also the following sorts of messages are > the indications correct: > > swap_pager: out of swap space > swp_pager_getswapspace(3): failed (To the best of my knowledge, I never encountered an "out of swap space" killing without those indicators.)
(In reply to Graham Perrin from comment #5) Just for context . . . Folks trying buildworld buildkernel on small armv7 and aarch64 boards with only, say, 1 GiBytes or 2 GiBytes of RAM, tend to see such kills from long running, compute/memory-bound llvm compiles and links, even when configured with RAM+swap being sufficient so that swap does not run out. The classic solution for such folks has been the use of something like: vm.pageout_oom_seq=120 vm.pfault_oom_attempts=-1 I build my own kernels, with sufficient messaging added to indicate which of the 4 conditions initiated the kill: Sustained low free RAM (via 1 or more stays-runnable processes). A sufficiently delayed pageout. The swap blk uma zone was exhausted. The swap pctrie uma zone was exhausted. (Those are not the messages themselves, just the summaries.) I do that on everything from small arm boards to a ThreadRipper 1950X that I have access to. If I end up with such a kill, I want to know which condition lead to it in order to figure out what to do in the future. I, of course, also check on if "swap_pager: out of swap space" or "swp_pager_getswapspace(. . .): failed" messages also happened. So far as I know, actually running out of swap space and getting the kills involves at least one of the 4 conditions as well: there is no separate condition for out of swap space that initiates a kill in what I found in the kernel. Hopefully these sorts of notes are of some use to Michael in getting control of the problem.
(In reply to Michael from comment #0) QUOTE sudo swapinfo -h Device Size Used Avail Capacity /dev/md99 8.0G 0B 8.0G 0% ll /zroot/gimpswap/ -rwxr-x--- 1 hm wheel 3298597664 21 jul 10:59 gegl-swap-3266-0-shared* the gimpswap is not very big as it seems, I work on 24 megapixel images in 32bit mode END QUOTE I expect that this arrangement for swap suffers from the issues identified in: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=206048 comments #7 and #8. Those in turn are just reports of what Konstantin Belousov reported on the lists.
I've read every comment and seems the most simple and effective try is Mark Millard's hint vm.pageout_oom_seq=120 vm.pfault_oom_attempts=-1 as you have seen that I created a MD swap partition because I had the initial impression that Gimp wasn't working well with ARC, I set different min and max sizes but nothing brought me anywhere as to the crash, I don't know if disabling ARC would give me any advantage ... I have som core.dumps, allmighty drkonki and from klauncher from such a crash, unfortunately I have problems getting gimp debugging and create crash trace or something so if somebody can read these, here's the link drkonki.core https://drive.google.com/file/d/1qAmMAikYKq1yqnOnrv1SkbmoEIdUbhZ4/view?usp=sharing klauncher.core https://drive.google.com/file/d/1LYC2x_herlHGmZYdrXfN0TlJoEj7ieg9/view?usp=sharing thank you all so far for the help
(In reply to Michael from comment #8) Just for the record: Are you seeing any messages that involved text like: swap_pager: out of swap space swp_pager_getswapspace(. . .): failed ?
(In reply to Mark Millard from comment #9) never, after I saw the comment abut I checked the logs again but nothing on the other hand, this both vm.pageout_oom_seq=120 vm.pfault_oom_attempts=-1 seem to work, I came to the moment when the active windows freezes and the mouse pointer is dead ... but! it recovered, that was a fast test I did, I will check more
at the end it didn't help, it crashed but only GIMP and the other parts recoverved, also the log is different now swap_pager: indefinite wait buffer: bufobj: 0, blkno: 136718, size: 8192 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 134239, size: 8192 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 136718, size: 8192 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 134239, size: 8192 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 135417, size: 16384 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 136718, size: 8192 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 134239, size: 8192 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 64673, size: 8192 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 136765, size: 20480 pid 1954 (gimp-2.10), jid 0, uid 1000, was killed: out of swap space but memory or cash wasn't out last pid: 2427; load averages: 0,36, 0,40, 0,55 up 0+02:11:54 07:10:43 66 processes: 1 running, 65 sleeping CPU: 0,2% user, 0,0% nice, 0,2% system, 0,2% interrupt, 99,4% idle Mem: 513M Active, 593M Inact, 112K Laundry, 975M Wired, 100M Buf, 1818M Free ARC: 407M Total, 173M MFU, 202M MRU, 9216B Anon, 2829K Header, 28M Other 303M Compressed, 707M Uncompressed, 2,34:1 Ratio Swap: 8192M Total, 337M Used, 7855M Free, 4% Inuse I have the two cordumps konke and klauncher and one more, bsdisk.core, if someone wants it I will try compacting it because appears as "unreadable"
(In reply to Michael from comment #11) here the bsdisks.core https://drive.google.com/file/d/1eZJsjwQdg7BF3RD7d-g_NplUnyeyX3an/view?usp=sharing
(In reply to Michael from comment #12) and another one showed up https://drive.google.com/file/d/1PTWOTR90MSE2IzaXm5UC2vMgw7P7311t/view?usp=sharing
(In reply to Michael from comment #11) What I've been told in the past about each message like: swap_pager: indefinite wait buffer: bufobj: ???, blkno: ???, size: ??? is "It took more than 30s for the IO to complete." Also: "The swap pager is complaining. It is used for things other than pure swapping to a swap partition...". (Both are from Warner L.) (It is not certain that the kill is driven by those messages.) Was your top output from before the: pid 1954 (gimp-2.10), jid 0, uid 1000, was killed: out of swap space showed up? Or after? After need not reflect the failing conditions that drive the kill any more. You might have to record the sequence of top outputs over the time frame and look back before various outputs from before the message. Since this happened with: vm.pfault_oom_attempts=-1 one thing I know to do to investigate is to use a kernel that produces messages similar to what my builds would produce so we know exactly which of the 4 conditions happened for sure. (I wish such was standard in FreeBSD.) Another might be to be watching or recording gstat -spod output over the time frame to get a better handle on what the I/O is like. (If the "indefinite wait buffer" status is driving the kills, anyway.) I will say that, even if "indefinite wait buffer" is not driving the kills, if your I/O system takes that long to do the I/O that of itself could be considered a significant usability problem.
(In reply to Mark Millard from comment #14) "(It is not certain that the kill is driven by those messages.)" Fixing a dumb wording: not "those messages" but by what leads to those message being reported.
(In reply to Mark Millard from comment #14) The large I/O times may well be mostly wait-time in a queue with a large queue built up of pending I/Os. So: the system generating pages to write after than the I/O system is writing them might be what is going on. (An large accumulated set of pending reads may not be so likely to be generated. That is why I picked on writing out pages as an example.) Again, I'm not sure the long I/O times are driving things. I also made a silly assumption and there is another experiment: increasing vm.pageout_oom_seq. I know someone used something like: vm.pageout_oom_seq=4000 I've no clue if there is a figure large enough to have numeric overflows involved or other issues. But this figure likely can be rather large. The 120 value was enough to allow -j4 buildworld buildkernel to complete on low end armv7 and aarch64 hardware. Mixed with vm.pfault_oom_attempts=-1 it was enough for someone using a microsd card as I remember. (My I/O context was better for the purpose in such a context.)
Created attachment 226642 [details] releng/13 (-p3) patch for reporting OOM condition explicitly The patch was copy/pasted and might have whitespace issues.
(In reply to Mark Millard from comment #17) I do not know if it is reasonable for you to build and boot a variant of releng/13 's kernel based on the patch or not. If nothing else, it shows where in the code the 4 conditions lead to kills.
(In reply to Mark Millard from comment #17) Mark, I guess I am already with the patch FreeBSD hm-fbsd 13.0-RELEASE-p3 FreeBSD 13.0-RELEASE-p3 but if there comes something up it is easy for me compiling a new kernel
(In reply to Mark Millard from comment #15) this poor things vm.pageout_oom_seq=120 vm.pfault_oom_attempts=-1 at the end couldn't hold it up, new crash to answer your question it is a mix of execs which crash so far, as is looks to me it happens while I am working with gimp but from gimp is not coming the first message it is a mix of KDE pieces, in first place it is plasmashell or klauncher, then anything else what was open is listed, I can not say if it is the real order certainly it is not really kernel related because the underlying OS is working still fine, sometimes but not always I can switch to another terminal ctrl+alt+fx but also konsole is last man standing, means I can in most cases type a "shutdown now" and it goes to single user mode, a ctrl+d starts ssdm and X and I can keep working after login what I am just noticing is that there was a bigger update for KDE pkgs, I believe it was from plasma 5.22 to 5.23 and I didn't have had this problem your question "Was your top output from before the: pid 1954 (gimp-2.10), jid 0, uid 1000, was killed: out of swap space" all messages I pasted, I copied in original sequence starting with first occurrence in log I hope I haven't answered all and go now bumping OOM sequ up thanks
(In reply to Michael from comment #19) The patch is intended to be applied to: FreeBSD 13.0-RELEASE-p3 FreeBSD 13.0-RELEASE-p3 does not contain the changes.
(In reply to Michael from comment #20) > certainly it is not really kernel related because the underlying OS is working still fine No. The kills are initiated by the kernel --and only by the kernel. The OS is doing the kills for failing the conditions for allowing continued operation of all the processes. The kernel is also what reports: "swap_pager: indefinite wait buffer:" messages. I would like the patched kernel to be used in order to find out exactly --and for sure-- which condition in the kernel is failing to allow continued operation of all processes. I've no objection to combining such with an increased vm.pageout_oom_seq value assigned at boot: A) If it still does a kill, we learn from the messages what the condition was. B) If it no longer does a kill, we then know that the condition that had been failing was the test involving vm.pageout_oom_seq .
(In reply to Michael from comment #20) > all messages I pasted, I copied in original sequence starting with first occurrence in log Of which exact type of message? If it was a kill message, then it was already too late to see the RAM use around the time just before the kill was done. One of the problems with trying to monitor the system is that, for example, large changes in the amount of attempted memory use (and RAM use) being attempted could occur multiple times per second. But if such happens, it is difficult to observe usefully to even detect that such is the type of context. Some folks try having top running in a loop, sleeping between runs, logging to a file so there is at least a history-sequence (presuming this does not end up killed before the file system updates). A similar point goes for gstat output. But these also end up competing with the paging/swapping activity for I/O resource use. So far as I can tell, the best next evidence that we could get is the patched-in messaging about exactly which condition initiated each kill. The patch does not attempt to prevent the kills or make things work for you, but just reports what condition in the kernel lead to each.
(In reply to Mark Millard from comment #16) I wrote: > the system > generating pages to write after than the I/O system is > writing them might be what is going on Wrong word: "after". Should have been: "faster". So: . . . the system generating pages to write faster than the I/O system is writing them might be what is going on
(In reply to Mark Millard from comment #16) I guessed wrong, in swap_pager_getpages_locked there is: while ((ma[0]->oflags & VPO_SWAPINPROG) != 0) { ma[0]->oflags |= VPO_SWAPSLEEP; VM_CNT_INC(v_intrans); if (VM_OBJECT_SLEEP(object, &object->handle, PSWP, "swread", hz * 20)) { printf( "swap_pager: indefinite wait buffer: bufobj: %p, blkno: %jd, size: %ld\n", bp->b_bufobj, (intmax_t)bp->b_blkno, bp->b_bcount); } } So the "swap_pager: indefinite wait buffer:" are only for reads and not for writes. It also looks like the time is 20 seconds before such a message is reported, not the 30 sec that I'd been told. Part of the issue might be if write activity delays pending read activity in the queue.
(In reply to Michael from comment #12) I'll note that when the kernel kills a process, that process might leave behind a *.core file as a consequence. But it is not the *.core that caused the kill, it was the kill that lead to the *.core : things were bad at the system level before the *.core happened. For a: kernel: pid ??? (???), jid ???, uid ??, was killed: out of swap space the *.core produced (if any) is likely not the thing of direct interest for evidence about the overall system level status that lead to the kill. The kill sequence goes after bigger processes first, working toward smaller processes later (not that the process sizes are static during this). So the sequence: chrome, gimp-2.10, chrome, chrome, plasmashell, chrome, chrome, Xorg in the Description is suggestive of the relative sizes of the processes around the time of each kill. That the chrome subsequence had pids of (in order) 1551, 1546, 1574, 1548, 1786 shows that the (roughly decreasing?) size was not in the order oldest (1546) to newest (1786).
(In reply to Mark Millard from comment #26) The "bigger process" vs. smaller ones that I refer to are really in terms of size from: size = vmspace_swap_count(vm); if (shortage == VM_OOM_MEM || shortage == VM_OOM_MEM_PF) size += vm_pageout_oom_pagecount(vm);
(In reply to Mark Millard from comment #26) > … kill sequence … I recall this thread, which spanned two months in the archives: The out-of-swap killer makes poor choices <https://lists.freebsd.org/pipermail/freebsd-hackers/2021-February/thread.html#57017> <https://lists.freebsd.org/pipermail/freebsd-hackers/2021-March/thread.html#57045>
(In reply to Mark Millard from comment #23) the messages I'm referring to are to be seen in the order I pasted them here what also is the order they appeared in my logs, anything before is not related, at the end I might cut when the same message repeats forever but then I probably cut and next line are just thre dots the rest is read and hope I find some time to build a kernel with the right patch you said thanks so far
in some way my pC became unusable, it's like slow motion all reactions on mouse or keyboard come after minutes (no exageration) the naked Freebsd is normal and runs fine, I have created another user but same, so it is not kde configuration, i deleted all installed pkgs and reinstalled tem from CLI and went fast as always, my linux partition is also normal what is left? I would say the nvidia-driver-390-143 which was updated this days and all former versions 390-141 are vanished, the freebsd driveerv from Nvidia do not compile the freebsd nv driver does not recognize my v-card so maybe I have to wait until the nvidia-driver-390-144 is in the repo wonderfull isn't it may be the complete story was the caused by video driver? this morning i found this when going back to single user mode pid 1357 (drkonqi), jid 0, uid 1000: exited on signal 6 (core dumped) pid 1354 (klauncher), jid 0, uid 1000: exited on signal 6 (core dumped) pid 1356 (plasmashell), jid 0, uid 1000: exited on signal 6 (core dumped) pid 864 (bsdisks), jid 0, uid 0: exited on signal 11 (core dumped) same message but no gimp because it was open, alias nothig in this list is a user program so like it seems we mounted the wrong horse ... this is alos not working from loader.conf Solaris: WARNING: ignoring tunable zfs_arc_min (using 0 instead) Solaris: WARNING: ignoring tunable zfs_arc_min (using 0 instead) may be I try it from sysctl.conf latert
(In reply to Michael from comment #30) > in some way my pC became unusable, it's like > slow motion all reactions on mouse or keyboard > come after minutes (no exageration) This is normal if the free RAM stays low and the system is paging extensively to media that is not fast for the purpose. Those are kinds of conditions that can also eventually lead to the kills. One of the tunables that I've indicated delays the kills, making the conditions last longer in order to avoid the kills. But, until I can see the output lines of the patched kernel I can not tell which of the 4 conditions are occurring. So, without that output I've basically no chance to be of more help. Also, as I've indicated before, we would need to see top output from shortly before the kills happen to see which processes are using what memory at the time. Looking at top output after the kills start does no good. As for: Solaris: WARNING: ignoring tunable zfs_arc_min (using 0 instead) Solaris: WARNING: ignoring tunable zfs_arc_min (using 0 instead) There is: /* Valid range: 32M - <arc_c_max> */ if ((zfs_arc_min) && (zfs_arc_min != arc_c_min) && (zfs_arc_min >= 2ULL << SPA_MAXBLOCKSHIFT) && (zfs_arc_min <= arc_c_max)) { arc_c_min = zfs_arc_min; arc_c = MAX(arc_c, arc_c_min); } WARN_IF_TUNING_IGNORED(zfs_arc_min, arc_c_min, verbose); As far as I can tell this happens from having too little RAM for the overall configuration, so that 32M is not even reasonable. (But I'm no ZFS tuning expert.) With defaults, as I remember, the standard recommendation is to have at least 8 GiBytes of RAM for ZFS. For less, some ZFS tuning expertise is needed (that I do not have). If you have done some zfs tuning, you should add notes to the bugzilla about what your settings are and what the machine has for RAM and such. Someone with ZFS tuning expertise might comment based on such information.
dear Mark, I told not only once that there is enough memory left in all situations, never less than 2Gib left ... there is something wrong and doesn't is in my hands, it only can by this miserable nvidia driver
(In reply to Michael from comment #32) You also report that you looked only after the kills started, not before. Looking after the kills start is too late to know what was true before the kills start. We need the report from the kernel of exactly which condition initiates the kills. (For some conditions, the messages report additional information.)
(In reply to Mark Millard from comment #33) I should have said (adding "just"): You also report that you looked only after the kills started, not just before. In case it is not clear: The purpose of the kills is to free memory. By the time the kills have gotten a start at their activity, there is normally less memory in use then there was when the kernel tests indicated to start the kills. Frankly, I do not see much point in continuing to comment until/unless the reports from the modified kernel are reported (along with the values in use for vm.pageout_oom_seq and vm.pfault_oom_attempts that were in use at the time).
(In reply to Mark Millard from comment #34) so finally I found the culprit and have my interchangeable zpool ready and running fine you lead me into the wrong direction Mark, I don't say it in an unfriendly manner, we just had different ideas, when you posted the ...4000 sysctl parm I woke up and deleted all so far made changes from my system after a cool repoot and monitoring what was happening I found that ZIL was disabled in loader.conf, don't know why I did this, may be my shadow haha and suddenly everything came back to normal so I thank you for the time you spent for me, thank you nevertheless, but another issue, these messages are wrong, they do not represent the real problem, they come up in the logs also when the system killed the apps for any reason as a halt for example
(In reply to Michael from comment #35) Cool. Glad you found what controlled the issue. I had no clue that you had been adjusting such things and would be very unlikely to have guessed such a change. (And my kernel patch's output would not have help with that.) I do not know the notation that was in loader.conf that you have changed. You might want to document the before-and-after for that notation in case someone looks at this bugzilla because of a problem that they are having. Also, with the problem being misconfiguration, you should change the Status for this bugzilla from New to Closed as Invalid. It is not actually a bug. (It is not something most folks can do with your submittal, including me.)
(In reply to Mark Millard from comment #36) Along with closing as Invalid, you might want to adjust the subject line to indicate the configuration error that was involved, possibly something like adding: (problem was: zil disabled) at the end of the subject.
(In reply to Michael from comment #30) > … this is alos not working from loader.conf > Solaris: WARNING: ignoring tunable zfs_arc_min (using 0 instead) > Solaris: WARNING: ignoring tunable zfs_arc_min (using 0 instead) > > may be I try it from sysctl.conf latert Note, . not _ root@mowa219-gjp4-8570p:~ # sysctl vfs.zfs.arc.min vfs.zfs.arc.min: 0 root@mowa219-gjp4-8570p:~ #
(In reply to Graham Perrin from comment #38) # sysctl -T vfs.zfs.arc.min vfs.zfs.arc.min: 0 Note: -T Display only variables that are settable via loader (CTLFLAG_TUN). # sysctl -W vfs.zfs.arc.min vfs.zfs.arc.min: 0 Note: -W Display only writable variables that are not statistical. Useful for determining the set of runtime tunable sysctls. So vfs.zfs.arc.min looks to be both a load-time tunable (-T) and a later writeable variable (-W). To illustrate what happens for -T and -W output when the name is loader-tunable but not later a writable variable (as an example): # sysctl -W kern.maxproc # sysctl -T kern.maxproc kern.maxproc: 70308 I'll note that I also see the 0 value for vfs.zfs.arc.min but I do no tuning of ZFS (I use defaults) and the system used for the above commands has 64 GiByte of RAM. My usage context is very different from chrome/gimp/plasmashell/Xorg and so my lack of seeing OOM activity does not mean much for the bugzilla report.
(In reply to Graham Perrin from comment #38) overseeing that solaris seems to answer this is one of the odds, second is this weird message and third that is saying using zero instead, you see soon the value was set ... this zfs is still a lot of Voodoo, thousand tunables which do not do so much because i see a self-tuning mechanism here, the user just has to car about having enough physical memory in the box and letting his hands in the pockets and everything runs smooth and fast the out of memory message is also misplaced because it shows up when you enter a classic reboot (writing the word and entert) anyway, just to remember Marc, I wasn't tuning my ZFS system, I was trying to get my cache right, not the zfs cache, sooo a lot of idea exchange what is good, now we are almost zfs specialists :) FYI I ha quite a moderate work day and I used this machine with only 8MB of ram, and everything regarding zfs untouched and and it worked fine so thanks for your help! I'm going to close this issue for now