Bug 257314 - FBSD 13 crash after some KDE parts crash supposing out of swap space
Summary: FBSD 13 crash after some KDE parts crash supposing out of swap space
Status: Closed Not A Bug
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-07-21 15:13 UTC by Michael
Modified: 2021-07-29 21:01 UTC (History)
2 users (show)

See Also:


Attachments
releng/13 (-p3) patch for reporting OOM condition explicitly (4.21 KB, patch)
2021-07-23 19:30 UTC, Mark Millard
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael 2021-07-21 15:13:30 UTC
it happens generally while using gimp which crashes in a sudden disappearing from screen, then for some time other apps were open and they vanish too until the screen is black, no panel nothing but my konsole which I always have in the corner, so I can do "shutdown now" and it works but hangs on a stalled block cursor on first character top left



thing here is I have enough memory, I have enough swap space which is barely used and also Gimp's gegl cache file which has no limite is in moderate use between 300MB up to 1GB



here comes what I get in messages, sometimes there are more, sometimes less programms



Jul 19 07:43:27 hm-fbsd kernel: pid 1551 (chrome), jid 0, uid 1002, was killed: out of swap space
Jul 19 07:43:27 hm-fbsd kernel: pid 1685 (gimp-2.10), jid 0, uid 1002, was killed: out of swap space
Jul 19 07:43:27 hm-fbsd kernel: pid 1546 (chrome), jid 0, uid 1002, was killed: out of swap space
Jul 19 07:43:27 hm-fbsd kernel: pid 1574 (chrome), jid 0, uid 1002, was killed: out of swap space
Jul 19 07:43:27 hm-fbsd kernel: pid 1283 (plasmashell), jid 0, uid 1002, was killed: out of swap space
Jul 19 07:43:27 hm-fbsd kernel: pid 1548 (chrome), jid 0, uid 1002, was killed: out of swap space
Jul 19 07:43:27 hm-fbsd kernel: pid 1786 (chrome), jid 0, uid 1002, was killed: out of swap space
Jul 19 07:43:27 hm-fbsd kernel: pid 1193 (Xorg), jid 0, uid 0, was killed: out of swap space
Jul 19 07:43:33 hm-fbsd console-kit-daemon[1189]: GLib-GObject-WARNING: instance of invalid non-instantiatable type '(null)'
Jul 19 07:43:33 hm-fbsd console-kit-daemon[1189]: GLib-GObject-CRITICAL: g_signal_handlers_disconnect_matched: assertion 'G_TYPE_CHECK_INSTANCE (instance)'
failed
Jul 19 07:43:33 hm-fbsd console-kit-daemon[1189]: GLib-GObject-WARNING: instance of invalid non-instantiatable type '(null)'
Jul 19 07:43:33 hm-fbsd console-kit-daemon[1189]: GLib-GObject-CRITICAL: g_signal_handlers_disconnect_matched: assertion 'G_TYPE_CHECK_INSTANCE (instance)'
failed
Jul 19 07:43:33 hm-fbsd console-kit-daemon[1189]: GLib-GObject-WARNING: instance of invalid non-instantiatable type '(null)'
Jul 19 07:43:33 hm-fbsd console-kit-daemon[1189]: GLib-GObject-CRITICAL: g_signal_handlers_disconnect_matched: assertion 'G_TYPE_CHECK_INSTANCE (instance)'
failed
Jul 19 07:43:41 hm-fbsd devd[663]: check_clients:  dropping disconnected client
Jul 19 07:43:41 hm-fbsd syslogd: last message repeated 1 times


some irrelevant lines later this:

Jul 19 07:44:04 hm-fbsd dbus-daemon[1122]: [system] Rejected send message, 2 matched rules; type="method_call", sender=":1.100" (uid=1002 pid=1904 comm="/us
r/local/lib/libexec/org_kde_powerdevil ") interface="org.freedesktop.ConsoleKit.Manager" member="CanSuspendThenHibernate" error name="(unset)" requested_rep
ly="0" destination="org.freedesktop.ConsoleKit" (uid=0 pid=1189 comm="/usr/local/sbin/console-kit-daemon --no-daemon ")
Jul 19 07:47:23 hm-fbsd chrome[2001]: [2001:101263:0719/074723.261261:ERROR:browser_dm_token_storage_linux.cc(93)] Error: /etc/machine-id contains 0 charact
ers (32 were expected).




sudo swapinfo -h 
Device              Size     Used    Avail Capacity
/dev/md99           8.0G       0B     8.0G     0%



ll /zroot/gimpswap/
-rwxr-x---  1 hm  wheel  3298597664 21 jul 10:59 gegl-swap-3266-0-shared*

the gimpswap is not very big as it seems, I work on 24 megapixel images in 32bit mode 


I can repeat the issue easily by repeating a memory intensive function 3 or 4 times and bang, means if you could tell me a better debug method I can do it quite fast

on the same machine I have a linux disk with the same setup and never had any problem

this disk is a factory zfs root install freeBSD 13 and amd cpu

GIMP has a setting to use or not to use hardware acceleration and it doesn't matter
Comment 1 Graham Perrin 2021-07-21 18:15:15 UTC
Killings are to be expected when swap space is exhausted. 

Need to tell what causes exhaustion in your case. 

(In reply to Michael from comment #0)

> Jul 19 07:43:27 hm-fbsd kernel: pid 1551 (chrome), jid 0, uid 1002, 
> was killed: out of swap space

If the first killing was of Chromium, then give thought to what might cause Chromium to use too much memory. 

Monitor usage. Utilities such as these might help: 

sysutils/htop

sysutils/gkrellm2 – I currently use the twilite theme, YMMV (see <>).
Comment 2 Michael 2021-07-21 19:58:54 UTC
(In reply to Graham Perrin from comment #1)

yes I know, but it is saying out of swap space, well we can be suspicious that memory was out, but wasn't and swap also not

for monitor purposes I have the terminal(konsole) always open, specially since this crashes ocurre 

I believe that Gimp is the first caller, because it happened so far only when I was working with gimp, then I believe plasma desktop goes away and  with it everything what is open ... this both are my primary suspects but still it makes no sense because there is not much swap use at all and always free memory available ... that's the point which I need to find
Comment 3 Mark Millard 2021-07-21 23:44:08 UTC
(In reply to Graham Perrin from comment #1)

Various "out of swap space" kill messages are misnomers,
unfortunately.

Only if there were also the following sorts of messages are
the indications correct:

swap_pager: out of swap space
swp_pager_getswapspace(3): failed

Other causes of "out of swap space" kill messages include:

Sustained low free RAM (via 1 more more stays-runnable processes).
A sufficiently delayed pageout.
The swap blk uma zone was exhausted.
The swap pctrie uma zone was exhausted.

The first two have loader tunable's that make the system
tolerate conditions for longer, potentially much longer
before starting kills. (The values shown are generally
only examples that were sufficient for some specific
context.)

I recommend trying the following sorts of thing in, say,
/boot/loader.conf and booting with the settings used:

#
# Delay when persistent low free RAM leads to
# Out Of Memory killing of processes:
vm.pageout_oom_seq=120
#
# For plunty of swap/paging space (will not
# run out), avoid pageout delays leading to
# Out Of Memory killing of processes:
vm.pfault_oom_attempts=-1
#
# For possibly insufficient swap/paging space
# (might run out), increase the pageout delay
# that leads to Out Of Memory killing of
# processes (showing defaults at the time):
#vm.pfault_oom_attempts= 3
#vm.pfault_oom_wait= 10
# (The multiplication is the total but there
# are other potential tradoffs in the factors
# multiplied, even for nearly the same total.)


I'm not aware of anything to adjust for either
of:

The swap blk uma zone was exhausted.
The swap pctrie uma zone was exhausted.
Comment 4 Mark Millard 2021-07-22 00:08:06 UTC
(In reply to Mark Millard from comment #3)

By the way, for reference:

# sysctl -d vm.pageout_oom_seq
vm.pageout_oom_seq: back-to-back calls to oom detector to start OOM

(As I remember teh default is 12 for the above.)

# sysctl -d vm.pfault_oom_attempts
vm.pfault_oom_attempts: Number of page allocation attempts in page fault
handler before it triggers OOM handling

(-1 for the above disables the pfault OOM handling.)

# sysctl -d vm.pfault_oom_wait
vm.pfault_oom_wait: Number of seconds to wait for free pages before retrying
the page fault handler


All 3 of those show up under both sysctl -T and sysctl -W .
Comment 5 Graham Perrin 2021-07-22 07:00:25 UTC
My apologies, 

(In reply to Mark Millard from comment #3)

> … Various "out of swap space" kill messages are misnomers, unfortunately. …

– I was previously unaware of this. 


> Only if there were also the following sorts of messages are
> the indications correct:
> 
> swap_pager: out of swap space
> swp_pager_getswapspace(3): failed

(To the best of my knowledge, I never encountered an "out of swap space" killing without those indicators.)
Comment 6 Mark Millard 2021-07-22 07:58:08 UTC
(In reply to Graham Perrin from comment #5)

Just for context . . .

Folks trying buildworld buildkernel on small armv7
and aarch64 boards with only, say, 1 GiBytes or 2
GiBytes of RAM, tend to see such kills from long
running, compute/memory-bound llvm compiles and links,
even when configured with RAM+swap being sufficient
so that swap does not run out.

The classic solution for such folks has been the
use of something like:

vm.pageout_oom_seq=120
vm.pfault_oom_attempts=-1

I build my own kernels, with sufficient messaging
added to indicate which of the 4 conditions initiated
the kill:

Sustained low free RAM (via 1 or more stays-runnable processes).
A sufficiently delayed pageout.
The swap blk uma zone was exhausted.
The swap pctrie uma zone was exhausted.

(Those are not the messages themselves, just the
summaries.)

I do that on everything from small arm boards to
a ThreadRipper 1950X that I have access to. If I
end up with such a kill, I want to know which
condition lead to it in order to figure out what
to do in the future. I, of course, also check on
if "swap_pager: out of swap space" or
"swp_pager_getswapspace(. . .): failed" messages
also happened.

So far as I know, actually running out of swap space
and getting the kills involves at least one of the 4
conditions as well: there is no separate condition
for out of swap space that initiates a kill in what I
found in the kernel.

Hopefully these sorts of notes are of some use to
Michael in getting control of the problem.
Comment 7 Mark Millard 2021-07-22 08:04:58 UTC
(In reply to Michael from comment #0)

QUOTE
sudo swapinfo -h 
Device              Size     Used    Avail Capacity
/dev/md99           8.0G       0B     8.0G     0%

ll /zroot/gimpswap/
-rwxr-x---  1 hm  wheel  3298597664 21 jul 10:59 gegl-swap-3266-0-shared*

the gimpswap is not very big as it seems, I work on 24 megapixel images in 32bit mode 
END QUOTE

I expect that this arrangement for swap suffers from the
issues identified in:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=206048

comments #7 and #8. Those in turn are just reports of
what Konstantin Belousov reported on the lists.
Comment 8 Michael 2021-07-22 13:18:46 UTC
I've read every comment and seems the most simple and effective try is Mark Millard's hint

vm.pageout_oom_seq=120
vm.pfault_oom_attempts=-1

as you have seen that I created a MD swap partition because I had the initial impression that Gimp wasn't working well with ARC, I set different min and max sizes but nothing brought me anywhere as to the crash, I don't know if disabling ARC would give me any advantage ...

I have som core.dumps, allmighty drkonki and from klauncher from such a crash, unfortunately I have problems getting gimp debugging and create crash trace or something

so if somebody can read these, here's the link

drkonki.core
https://drive.google.com/file/d/1qAmMAikYKq1yqnOnrv1SkbmoEIdUbhZ4/view?usp=sharing

klauncher.core
https://drive.google.com/file/d/1LYC2x_herlHGmZYdrXfN0TlJoEj7ieg9/view?usp=sharing


thank you all so far for the help
Comment 9 Mark Millard 2021-07-22 19:57:24 UTC
(In reply to Michael from comment #8)

Just for the record: Are you seeing any messages that involved
text like:

swap_pager: out of swap space
swp_pager_getswapspace(. . .): failed

?
Comment 10 Michael 2021-07-22 21:23:40 UTC
(In reply to Mark Millard from comment #9)

never, after I saw the comment abut I checked the logs again but nothing

on the other hand, this both

vm.pageout_oom_seq=120
vm.pfault_oom_attempts=-1

seem to work, I came to the moment when the active windows freezes and the mouse pointer is dead ... but! it recovered, that was a fast test I did, I will check more
Comment 11 Michael 2021-07-23 10:17:55 UTC
at the end it didn't help, it crashed but only GIMP and the other parts recoverved, also the log is different now

swap_pager: indefinite wait buffer: bufobj: 0, blkno: 136718, size: 8192
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 134239, size: 8192
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 136718, size: 8192
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 134239, size: 8192
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 135417, size: 16384
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 136718, size: 8192
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 134239, size: 8192
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 64673, size: 8192
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 136765, size: 20480
pid 1954 (gimp-2.10), jid 0, uid 1000, was killed: out of swap space

but memory or cash wasn't out 

last pid:  2427;  load averages:  0,36,  0,40,  0,55                                                      up 0+02:11:54  07:10:43
66 processes:  1 running, 65 sleeping
CPU:  0,2% user,  0,0% nice,  0,2% system,  0,2% interrupt, 99,4% idle
Mem: 513M Active, 593M Inact, 112K Laundry, 975M Wired, 100M Buf, 1818M Free
ARC: 407M Total, 173M MFU, 202M MRU, 9216B Anon, 2829K Header, 28M Other
     303M Compressed, 707M Uncompressed, 2,34:1 Ratio
Swap: 8192M Total, 337M Used, 7855M Free, 4% Inuse


I have the two cordumps konke and klauncher and one more, bsdisk.core, if someone wants it I will try compacting it because appears as "unreadable"
Comment 12 Michael 2021-07-23 10:25:53 UTC
(In reply to Michael from comment #11)
here the bsdisks.core


https://drive.google.com/file/d/1eZJsjwQdg7BF3RD7d-g_NplUnyeyX3an/view?usp=sharing
Comment 13 Michael 2021-07-23 10:32:18 UTC
(In reply to Michael from comment #12)
and another one showed up 

https://drive.google.com/file/d/1PTWOTR90MSE2IzaXm5UC2vMgw7P7311t/view?usp=sharing
Comment 14 Mark Millard 2021-07-23 18:48:16 UTC
(In reply to Michael from comment #11)

What I've been told in the past about each message like:

swap_pager: indefinite wait buffer: bufobj: ???, blkno: ???, size: ???

is "It took more than 30s for the IO to complete." Also:
"The swap pager is complaining. It is used for things
other than pure swapping to a swap partition...". (Both
are from Warner L.)

(It is not certain that the kill is driven by those messages.)


Was your top output from before the:

pid 1954 (gimp-2.10), jid 0, uid 1000, was killed: out of swap space

showed up? Or after? After need not reflect the failing
conditions that drive the kill any more. You might have
to record the sequence of top outputs over the time frame
and look back before various outputs from before the
message.


Since this happened with:

vm.pfault_oom_attempts=-1

one thing I know to do to investigate is to use a kernel that
produces messages similar to what my builds would produce so
we know exactly which of the 4 conditions happened for sure.
(I wish such was standard in FreeBSD.)

Another might be to be watching or recording gstat -spod output
over the time frame to get a better handle on what the I/O is
like. (If the "indefinite wait buffer" status is driving the
kills, anyway.)


I will say that, even if "indefinite wait buffer" is not driving
the kills, if your I/O system takes that long to do the I/O that
of itself could be considered a significant usability problem.
Comment 15 Mark Millard 2021-07-23 18:50:24 UTC
(In reply to Mark Millard from comment #14)

"(It is not certain that the kill is driven by those messages.)"

Fixing a dumb wording: not "those messages" but by what leads
to those message being reported.
Comment 16 Mark Millard 2021-07-23 19:04:59 UTC
(In reply to Mark Millard from comment #14)

The large I/O times may well be mostly wait-time in a queue
with a large queue built up of pending I/Os. So: the system
generating pages to write after than the I/O system is
writing them might be what is going on.

(An large accumulated set of pending reads may not be so
likely to be generated. That is why I picked on writing
out pages as an example.)

Again, I'm not sure the long I/O times are driving things.

I also made a silly assumption and there is another experiment:
increasing vm.pageout_oom_seq. I know someone used something
like:

vm.pageout_oom_seq=4000

I've no clue if there is a figure large enough to have numeric
overflows involved or other issues. But this figure likely can
be rather large.

The 120 value was enough to allow -j4 buildworld buildkernel to
complete on low end armv7 and aarch64 hardware. Mixed with
vm.pfault_oom_attempts=-1 it was enough for someone using a
microsd card as I remember. (My I/O context was better for
the purpose in such a context.)
Comment 17 Mark Millard 2021-07-23 19:30:52 UTC
Created attachment 226642 [details]
releng/13 (-p3) patch for reporting OOM condition explicitly

The patch was copy/pasted and might have whitespace issues.
Comment 18 Mark Millard 2021-07-23 19:33:43 UTC
(In reply to Mark Millard from comment #17)

I do not know if it is reasonable for you to build
and boot a variant of releng/13 's kernel based on
the patch or not.

If nothing else, it shows where in the code the 4
conditions lead to kills.
Comment 19 Michael 2021-07-23 19:41:00 UTC
(In reply to Mark Millard from comment #17)
Mark, I guess I am already with the patch 

FreeBSD hm-fbsd 13.0-RELEASE-p3 FreeBSD 13.0-RELEASE-p3

but if there comes something up it is easy for me compiling a new kernel
Comment 20 Michael 2021-07-23 19:56:42 UTC
(In reply to Mark Millard from comment #15)

this poor things 

vm.pageout_oom_seq=120
vm.pfault_oom_attempts=-1

at the end couldn't hold it up, new crash

to answer your question

it is a mix of execs which crash
so far, as is looks to me it happens while I am working with gimp

but from gimp is not coming the first message

it is a mix of KDE pieces, in first place it is plasmashell or klauncher, then anything else what was open is listed, I can not say if it is the real order

certainly it is not really kernel related because the underlying OS is working still fine, sometimes but not always I can switch to another terminal ctrl+alt+fx but also konsole is last man standing, means I can in most cases type a "shutdown now" and it goes to single user mode, a ctrl+d starts ssdm and X and I can keep working after login

what I am just noticing is that there was a bigger update for KDE pkgs, I believe it was from plasma 5.22 to 5.23 and I didn't have had this problem 

your question

"Was your top output from before the:
pid 1954 (gimp-2.10), jid 0, uid 1000, was killed: out of swap space"

all messages I pasted, I copied in original sequence starting with first occurrence in log 

I hope I haven't answered all and go now bumping OOM sequ up 

thanks
Comment 21 Mark Millard 2021-07-23 20:08:36 UTC
(In reply to Michael from comment #19)

The patch is intended to be applied to: FreeBSD 13.0-RELEASE-p3

FreeBSD 13.0-RELEASE-p3 does not contain the changes.
Comment 22 Mark Millard 2021-07-23 20:20:02 UTC
(In reply to Michael from comment #20)

> certainly it is not really kernel related because the underlying OS is working still fine

No. The kills are initiated by the kernel --and only by
the kernel. The OS is doing the kills for failing the
conditions for allowing continued operation of all the
processes.

The kernel is also what reports: "swap_pager: indefinite wait
buffer:" messages.

I would like the patched kernel to be used in order to find
out exactly --and for sure-- which condition in the kernel
is failing to allow continued operation of all processes.

I've no objection to combining such with an increased
vm.pageout_oom_seq value assigned at boot:

A) If it still does a kill, we learn from the messages
   what the condition was.

B) If it no longer does a kill, we then know that the
   condition that had been failing was the test
   involving vm.pageout_oom_seq .
Comment 23 Mark Millard 2021-07-23 20:56:49 UTC
(In reply to Michael from comment #20)

> all messages I pasted, I copied in original sequence starting with first occurrence in log

Of which exact type of message? If it was a kill message, then it
was already too late to see the RAM use around the time just
before the kill was done.

One of the problems with trying to monitor the system is that,
for example, large changes in the amount of attempted memory
use (and RAM use) being attempted could occur multiple times
per second. But if such happens, it is difficult to observe
usefully to even detect that such is the type of context.

Some folks try having top running in a loop, sleeping between
runs, logging to a file so there is at least a history-sequence
(presuming this does not end up killed before the file system
updates). A similar point goes for gstat output. But these also
end up competing with the paging/swapping activity for I/O
resource use.


So far as I can tell, the best next evidence that we could get is
the patched-in messaging about exactly which condition initiated
each kill.

The patch does not attempt to prevent the kills or make things
work for you, but just reports what condition in the kernel lead
to each.
Comment 24 Mark Millard 2021-07-23 21:34:43 UTC
(In reply to Mark Millard from comment #16)

I wrote:

> the system
> generating pages to write after than the I/O system is
> writing them might be what is going on

Wrong word: "after". Should have been: "faster". So:

. . . the system generating pages to write faster than the
I/O system is writing them might be what is going on
Comment 25 Mark Millard 2021-07-23 22:14:42 UTC
(In reply to Mark Millard from comment #16)

I guessed wrong, in swap_pager_getpages_locked there is:

        while ((ma[0]->oflags & VPO_SWAPINPROG) != 0) {
                ma[0]->oflags |= VPO_SWAPSLEEP;
                VM_CNT_INC(v_intrans);
                if (VM_OBJECT_SLEEP(object, &object->handle, PSWP,
                    "swread", hz * 20)) {
                        printf(
"swap_pager: indefinite wait buffer: bufobj: %p, blkno: %jd, size: %ld\n",
                            bp->b_bufobj, (intmax_t)bp->b_blkno, bp->b_bcount);
                }
        }

So the "swap_pager: indefinite wait buffer:" are only for
reads and not for writes.

It also looks like the time is 20 seconds before such a
message is reported, not the 30 sec that I'd been told.

Part of the issue might be if write activity delays
pending read activity in the queue.
Comment 26 Mark Millard 2021-07-24 00:55:01 UTC
(In reply to Michael from comment #12)

I'll note that when the kernel kills a process, that process
might leave behind a *.core file as a consequence. But it is
not the *.core that caused the kill, it was the kill that lead
to the *.core : things were bad at the system level before the
*.core happened.

For a:

kernel: pid ??? (???), jid ???, uid ??, was killed: out of swap space

the *.core produced (if any) is likely not the thing of direct
interest for evidence about the overall system level status that
lead to the kill.

The kill sequence goes after bigger processes first, working
toward smaller processes later (not that the process sizes are
static during this). So the sequence:

chrome, gimp-2.10, chrome, chrome, plasmashell, chrome, chrome,
Xorg

in the Description is suggestive of the relative sizes of the
processes around the time of each kill. That the chrome
subsequence had pids of (in order) 1551, 1546, 1574, 1548,
1786 shows that the (roughly decreasing?) size was not in the
order oldest (1546) to newest (1786).
Comment 27 Mark Millard 2021-07-24 03:56:23 UTC
(In reply to Mark Millard from comment #26)

The "bigger process" vs. smaller ones that I refer to
are really in terms of size from:

                size = vmspace_swap_count(vm);
                if (shortage == VM_OOM_MEM || shortage == VM_OOM_MEM_PF)
                        size += vm_pageout_oom_pagecount(vm);
Comment 28 Graham Perrin 2021-07-24 08:47:39 UTC
(In reply to Mark Millard from comment #26)

> … kill sequence …

I recall this thread, which spanned two months in the archives: 

The out-of-swap killer makes poor choices
<https://lists.freebsd.org/pipermail/freebsd-hackers/2021-February/thread.html#57017>
<https://lists.freebsd.org/pipermail/freebsd-hackers/2021-March/thread.html#57045>
Comment 29 Michael 2021-07-24 19:01:53 UTC
(In reply to Mark Millard from comment #23)

the messages I'm referring to are to be seen in the order I pasted them here what also is the order they appeared in my logs, anything before is not related, at the end I might cut when the same message repeats forever but then I probably cut and next line are just thre dots

the rest is read and hope I find some time to build a kernel with the right patch you said 

thanks so far
Comment 30 Michael 2021-07-25 16:07:10 UTC
in some way my pC became unusable, it's like slow motion all reactions on mouse or keyboard come after minutes (no exageration)

the naked Freebsd is normal and runs fine, I have created another user but same, so it is not kde configuration, i deleted all installed pkgs and reinstalled tem from CLI and went fast as always, my linux partition is also normal

what is left? I would say the nvidia-driver-390-143 which was updated this days and all former versions 390-141 are vanished, the freebsd driveerv from Nvidia do not compile 

the freebsd nv driver does not recognize my v-card

so maybe I have to wait until the nvidia-driver-390-144 is in the repo

wonderfull isn't it

may be the complete story was the caused by video driver?
this morning i found this when going back to single user mode

pid 1357 (drkonqi), jid 0, uid 1000: exited on signal 6 (core dumped)
pid 1354 (klauncher), jid 0, uid 1000: exited on signal 6 (core dumped)
pid 1356 (plasmashell), jid 0, uid 1000: exited on signal 6 (core dumped)
pid 864 (bsdisks), jid 0, uid 0: exited on signal 11 (core dumped)

same message but no gimp because it was open, alias nothig in this list is a user program

so like it seems we mounted the wrong horse ...

this is alos not working from loader.conf
Solaris: WARNING: ignoring tunable zfs_arc_min (using 0 instead)
Solaris: WARNING: ignoring tunable zfs_arc_min (using 0 instead)

may be I try it from sysctl.conf latert
Comment 31 Mark Millard 2021-07-25 16:46:16 UTC
(In reply to Michael from comment #30)

> in some way my pC became unusable, it's like
> slow motion all reactions on mouse or keyboard
> come after minutes (no exageration)

This is normal if the free RAM stays low and the
system is paging extensively to media that is not
fast for the purpose. Those are kinds of conditions
that can also eventually lead to the kills. One of
the tunables that I've indicated delays the kills,
making the conditions last longer in order to avoid
the kills.

But, until I can see the output lines of the
patched kernel I can not tell which of the 4
conditions are occurring. So, without that output
I've basically no chance to be of more help.

Also, as I've indicated before, we would need to
see top output from shortly before the kills happen
to see which processes are using what memory at the
time. Looking at top output after the kills start
does no good.

As for:

Solaris: WARNING: ignoring tunable zfs_arc_min (using 0 instead)
Solaris: WARNING: ignoring tunable zfs_arc_min (using 0 instead)

There is:

        /* Valid range: 32M - <arc_c_max> */
        if ((zfs_arc_min) && (zfs_arc_min != arc_c_min) &&
            (zfs_arc_min >= 2ULL << SPA_MAXBLOCKSHIFT) &&
            (zfs_arc_min <= arc_c_max)) {
                arc_c_min = zfs_arc_min;
                arc_c = MAX(arc_c, arc_c_min);
        }
        WARN_IF_TUNING_IGNORED(zfs_arc_min, arc_c_min, verbose);

As far as I can tell this happens from having too little RAM
for the overall configuration, so that 32M is not even reasonable.
(But I'm no ZFS tuning expert.) With defaults, as I remember, the
standard recommendation is to have at least 8 GiBytes of RAM for
ZFS. For less, some ZFS tuning expertise is needed (that I do not
have).

If you have done some zfs tuning, you should add notes to the
bugzilla about what your settings are and what the machine
has for RAM and such. Someone with ZFS tuning expertise might
comment based on such information.
Comment 32 Michael 2021-07-25 18:14:59 UTC
dear Mark, I told not only once that there is enough memory left in all situations, never less than 2Gib left ...

there is something wrong and doesn't is in my hands, it only can by this miserable nvidia driver
Comment 33 Mark Millard 2021-07-25 18:51:29 UTC
(In reply to Michael from comment #32)

You also report that you looked only after the
kills started, not before. Looking after the
kills start is too late to know what was true
before the kills start.

We need the report from the kernel of exactly
which condition initiates the kills. (For some
conditions, the messages report additional
information.)
Comment 34 Mark Millard 2021-07-25 19:07:28 UTC
(In reply to Mark Millard from comment #33)

I should have said (adding "just"):

You also report that you looked only after the
kills started, not just before.



In case it is not clear:

The purpose of the kills is to free memory. By
the time the kills have gotten a start at their
activity, there is normally less memory in use
then there was when the kernel tests indicated
to start the kills.



Frankly, I do not see much point in continuing
to comment until/unless the reports from the
modified kernel are reported (along with the
values in use for vm.pageout_oom_seq and
vm.pfault_oom_attempts that were in use at the
time).
Comment 35 Michael 2021-07-28 17:03:32 UTC
(In reply to Mark Millard from comment #34)
so finally I found the culprit and have my interchangeable zpool ready and running fine

you lead me into the wrong direction Mark, I don't say it in an unfriendly manner, we just had different ideas, when you posted the ...4000 sysctl parm I woke up and deleted all so far made changes from my system

after a cool repoot and monitoring what was happening I found that ZIL was disabled in loader.conf, don't know why I did this, may be my shadow haha and suddenly everything came back to normal 

so I thank you for the time you spent for me, thank you 

nevertheless, but another issue, these messages are wrong, they do not represent the real problem, they come up in the logs also when the system killed the apps for any reason as a halt for example
Comment 36 Mark Millard 2021-07-28 17:54:48 UTC
(In reply to Michael from comment #35)

Cool. Glad you found what controlled the issue.

I had no clue that you had been adjusting such things
and would be very unlikely to have guessed such a change.
(And my kernel patch's output would not have help with
that.)

I do not know the notation that was in loader.conf that you
have changed. You might want to document the before-and-after
for that notation in case someone looks at this bugzilla
because of a problem that they are having.

Also, with the problem being misconfiguration, you should
change the Status for this bugzilla from New to Closed as
Invalid. It is not actually a bug. (It is not something most
folks can do with your submittal, including me.)
Comment 37 Mark Millard 2021-07-28 18:10:26 UTC
(In reply to Mark Millard from comment #36)

Along with closing as Invalid, you might want to adjust the
subject line to indicate the configuration error that was
involved, possibly something like adding:

(problem was: zil disabled)

at the end of the subject.
Comment 38 Graham Perrin 2021-07-28 21:08:44 UTC
(In reply to Michael from comment #30)

> … this is alos not working from loader.conf
> Solaris: WARNING: ignoring tunable zfs_arc_min (using 0 instead)
> Solaris: WARNING: ignoring tunable zfs_arc_min (using 0 instead)
> 
> may be I try it from sysctl.conf latert

Note, . not _

root@mowa219-gjp4-8570p:~ # sysctl vfs.zfs.arc.min
vfs.zfs.arc.min: 0
root@mowa219-gjp4-8570p:~ #
Comment 39 Mark Millard 2021-07-28 21:34:02 UTC
(In reply to Graham Perrin from comment #38)

# sysctl -T vfs.zfs.arc.min
vfs.zfs.arc.min: 0

Note:      -T      Display only variables that are settable via loader
             (CTLFLAG_TUN).

# sysctl -W vfs.zfs.arc.min
vfs.zfs.arc.min: 0

Note:      -W      Display only writable variables that are not statistical.  Useful
             for determining the set of runtime tunable sysctls.

So vfs.zfs.arc.min looks to be both a load-time tunable (-T) and a
later writeable variable (-W).

To illustrate what happens for -T and -W output when
the name is loader-tunable but not later a writable
variable (as an example):

# sysctl -W kern.maxproc
# sysctl -T kern.maxproc
kern.maxproc: 70308

I'll note that I also see the 0 value for vfs.zfs.arc.min
but I do no tuning of ZFS (I use defaults) and the system
used for the above commands has 64 GiByte of RAM. My usage
context is very different from chrome/gimp/plasmashell/Xorg
and so my lack of seeing OOM activity does not mean much
for the bugzilla report.
Comment 40 Michael 2021-07-29 20:58:09 UTC
(In reply to Graham Perrin from comment #38)
overseeing that solaris seems to answer this is one of the odds, second is this weird message and third that is saying using zero instead, you see soon the value was set ... 

this zfs is still a lot of Voodoo, thousand tunables which do not do so much because i see a self-tuning mechanism here, the user just has to car about having enough physical memory in the box and letting his hands in the pockets  and everything  runs smooth and fast

the out of memory message is also misplaced because it shows up when you enter a classic reboot (writing the word and entert)

anyway, just to remember Marc, I wasn't tuning my ZFS system, I was trying to get my cache right, not the zfs cache, sooo a lot of idea exchange what is good, now we are almost zfs specialists :) 

FYI I ha quite a moderate work day and I used this machine with only 8MB of ram, and everything regarding zfs untouched and and it worked fine
  
so thanks for your help! 
I'm going to close this issue for now