Bug 280846 - Low memory freezes / OOM: a thread waited too long to allocate a page
Summary: Low memory freezes / OOM: a thread waited too long to allocate a page
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 14.1-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-08-16 04:32 UTC by Henrich Hartzer
Modified: 2024-10-29 21:13 UTC (History)
3 users (show)

See Also:


Attachments
proposed patch (2.48 KB, patch)
2024-10-04 15:02 UTC, Mark Johnston
no flags Details | Diff
Proposed patch modified for 14.1-RELEASE (1.70 KB, patch)
2024-10-05 15:01 UTC, Henrich Hartzer
no flags Details | Diff
accounting patch (3.57 KB, patch)
2024-10-14 13:44 UTC, Mark Johnston
no flags Details | Diff
Three patches in one for 14.1-RELEASE (5.88 KB, patch)
2024-10-14 17:54 UTC, Henrich Hartzer
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Henrich Hartzer 2024-08-16 04:32:01 UTC
I've been getting these "a thread waited too long to allocate a page" OOM-related freezes and crashes for at least 3 major releases. 12, 13, and 14.

I see it mostly with Firefox, possibly Gimp. Certain websites will seem to make it a lot more likely to happen. System will lock up for 30 seconds to 5 minutes until it kills something. Sometimes it kills nothing and it starts acting fine.

I have default sysctls in regards to OOM. Are there some settings I can use to make it kill faster, or more conservatively allow allocations?

I'm on x86_64 with 16GB of memory. I can be pushing 50 tabs when this happens. Sometimes less, sometimes more. It seems like it's much more likely to happen on some bloated websites, like HomeDepot, than lean websites like freebsd.org.

I assume Firefox is partly to blame, but I feel like FreeBSD should be able to tame it to some degree.

Thank you!
Comment 1 Mark Millard 2024-08-16 06:37:15 UTC
Can you specify more context?

ZFS vs. UFS vs. a mix?
Type(s) of storage media in use?
Any use of tmpfs or the like?
SWAP space configuration: how much SWAP?

Output analogous to:
# dmesg -a | grep " memory "
real memory  = 2066735104 (1970 MB)
avail memory = 1990144000 (1897 MB)

And to:
# swapinfo -m
Device          1M-blocks     Used    Avail Capacity
/dev/gpt/PkgBaseSwp3p5      3584       18     3565     1%

Video hardware (and how its is configured)?

Can you tell that it looks to be leading up to
a "a thread waited too long to allocate a page"
notice before it actually happens and before
things actually hang up? Vs. is the change
sudden, going from normal to hung up to later
reporting "a thread waited too long to allocate
a page"?

Can you leave "top -Sazores" running someplace where
you can then look at or monitor top's output if you
start to notice the symptoms? What do the 2 lines
like:

Mem: 2344K Active, 1302M Inact, 404K Laundry, 320M Wired, 194M Buf, 275M Free
Swap: 3584M Total, 18M Used, 3565M Free

show? What about the process list with the larger
RES(ident) RAM use figures (top area of the process
list, via the "ores" part of the arguments to the
top command)?

Similarly for the "gstat -spod" output is generally
like?

How few seconds, minutes, hours, days, weeks, or
months can a firefox session run before the problem
occurs? (I do not know the  general scale involved
from the descriptions so far.) Is there any common
context involved across the cases of a shorter time
to the problem --that is not involved in the longer
times to having the problem?

Note that OOM kills do not necessarily kill what you
would like them to kill. They can kill the process(es)
that are allowing you to control the machine in a
normal way.

So how few "just HomeDepot" tabs can  you create and
use and get the problem in your experience so far? How
many can you create and use and not get the problem?

Are you using X11? Something else? What window manager?
And so on?
Comment 2 Mark Millard 2024-08-16 06:50:43 UTC
(In reply to Mark Millard from comment #1)

What is the output like is analogous to in:

# sysctl vm.pfault_oom_attempts vm.pfault_oom_wait
vm.pfault_oom_attempts: 3
vm.pfault_oom_wait: 10

I'll note that:

# sysctl vm.pfault_oom_attempts=-1
vm.pfault_oom_attempts: 3 -> -1

prevents executing the sequence:

        printf("vm_fault_allocate_oom: proc %d (%s) failed to alloc page on fault, starting OOM\n",
                curproc->p_pid, curproc->p_comm);

        vm_pageout_oom(VM_OOM_MEM_PF);

but it does so by being willing to deadlock/livelock
the system instead. Also other forms of OOM activity
could still potentially happen and such a change
might make that more likely.
Comment 3 Mark Millard 2024-08-16 07:03:50 UTC
(In reply to Mark Millard from comment #2)

FYI: I should have noted that vm_pageout_oom(VM_OOM_MEM_PF)
leads to the notice: "a thread waited too long to allocate
a page".

Also the message that I quoted in #2 is my variant of what
is instead under: "if (bootverbose)" in standard FreeBSD
code.

So if you do a verbose boot, you should seem messages
based on:

       if (bootverbose)
               printf(
           "proc %d (%s) failed to alloc page on fault, starting OOM\n",
                   curproc->p_pid, curproc->p_comm);

which would report the exact process that had the
failure to allocate a page in a timely manor for
handling a page fault. (Presumes you are not using
vm.pfault_oom_attempts=-1 .)
Comment 4 Henrich Hartzer 2024-08-29 23:20:07 UTC
Sorry for the wait! Thank you for all of the helpful info and things to try/check.

I'm using ZFS without swap. I use encrypted rootfs as the installer does it. I think with geli? -- I wonder if this is partly to blame?

I'm using a SSD.

real memory  = 17179869184 (16384 MB)
avail memory = 16441286656 (15679 MB)

My swapinfo -m is blank.

No tmpfs.

Video is i915kms. I've had this problem on a Thinkpad x200 and now on a x230.

System will usually start to act up before the hang, but the last one it did (a few minutes ago) had pretty much no warning at all.

I started running top and gstat with those flags and will report back. I might try to reproduce it, we'll see.

Wayland + Sway these days, but I had it with X11 + DWM.

It might take a few HomeDepot tabs to do it, but it's not totally consistent. I've tried to reproduce it before and have had no luck.

vm.pfault_oom_attempts: 3
vm.pfault_oom_wait: 10

Here's some current stats:

244 processes: 2 running, 240 sleeping, 2 waiting
CPU:  1.9% user,  0.0% nice,  1.2% system,  0.4% interrupt, 96.6% idle
Mem: 919M Active, 1015M Inact, 134M Laundry, 1606M Wired, 56K Buf, 351M Free
ARC: 573M Total, 456M MFU, 75M MRU, 12K Anon, 6000K Header, 35M Other
     459M Compressed, 567M Uncompressed, 1.23:1 Ratio

dT: 1.003s  w: 1.000s
 L(q)  ops/s    r/s     kB   kBps   ms/r    w/s     kB   kBps   ms/w    d/s     kB   kBps   ms/d    o/s   ms/o   %busy Name
    0      0      0      0      0    0.0      0      0      0    0.0      0      0      0    0.0      0    0.0    0.0| ada0
    0      0      0      0      0    0.0      0      0      0    0.0      0      0      0    0.0      0    0.0    0.0| ada1


Please let me know if I missed anything pertinet. I'll hopefully remember to do a verbose boot.
Comment 5 Henrich Hartzer 2024-08-30 00:22:27 UTC
It was getting close to locking up again and I was able to grab some stats.

Now when I first saw it, 'state' for Firefox was pfault. When I pasted, it had turned to select.

last pid: 42814;  load averages:  2.18,  1.62,  1.21; battery: 99%                                                                    up 10+14:35:24  00:12:59
258 processes: 2 running, 254 sleeping, 2 waiting
CPU: 12.1% user,  0.0% nice, 11.9% system,  0.3% interrupt, 75.8% idle
Mem: 68M Active, 664K Inact, 13M Laundry, 1567M Wired, 56K Buf, 122M Free
ARC: 487M Total, 325M MFU, 99M MRU, 14M Anon, 5578K Header, 43M Other
     352M Compressed, 427M Uncompressed, 1.21:1 Ratio

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
39922 hhartzer    119  26    0  5188M  1352M select   3 142:10   6.48% firefox

dT: 1.001s  w: 1.000s
 L(q)  ops/s    r/s     kB   kBps   ms/r    w/s     kB   kBps   ms/w    d/s     kB   kBps   ms/d    o/s   ms/o   %busy Name
    0    372    372     91  33777    0.3      0      0      0    0.0      0      0      0    0.0      0    0.0    9.4| ada0
    0      0      0      0      0    0.0      0      0      0    0.0      0      0      0    0.0      0    0.0    0.0| ada1
Comment 6 Mark Millard 2024-08-30 01:49:51 UTC
Mem: 919M Active, 1015M Inact, 134M Laundry, 1606M Wired, 56K Buf, 351M Free

Note: ARC is in Wired.
Note: Buf recounts things in other categories and also is
      small here anyway. So I avoid it.

So:
 919M
1015M
 134M
1606M
 351M
-----
4025M or a somewhat under 4 GiBytes.

This is no where near your reported 15 GiByte+ figures:

real memory  = 17179869184 (16384 MB)
avail memory = 16441286656 (15679 MB)

Looks to me like You already had a big problem start long before
here, losing most of the RAM to a leak already.

Mem: 68M Active, 664K Inact, 13M Laundry, 1567M Wired, 56K Buf, 122M Free

So:
  68M
 664M
  13M
1567M
 122M
-----
2434M or somewhat under 2.5 GiBytes. A loss of roughly 1.5 GiBytes
compared to the above.

Looks like some sort of memory leak that is causing memory to not
be classified as one of Active, Inact, Laundry, Wired, or Free.

What to the figures look like right after booting, possibly both
before and after starting X11 or Wayland (but doing little else
after booting)?
Comment 7 Henrich Hartzer 2024-08-30 03:00:08 UTC
Wow, you're right. I thought the numbers looked odd, but wasn't sure if it was some top behavior I didn't understand.

Here's the figures after booting and starting sway. Much more reasonable!

last pid:  1642;  load averages:  0.20,  0.07,  0.02; battery: 99%                                                                     up 0+00:01:43  02:55:59
71 processes:  2 running, 67 sleeping, 2 waiting
CPU:  0.5% user,  0.0% nice,  0.9% system,  0.1% interrupt, 98.5% idle
Mem: 100M Active, 118M Inact, 571M Wired, 56K Buf, 15G Free
ARC: 191M Total, 32M MFU, 155M MRU, 814K Header, 2628K Other
     145M Compressed, 202M Uncompressed, 1.39:1 Ratio


This is after starting my usual set of apps. I'm pretty sure it just takes Firefox to cause this, though.

last pid:  2111;  load averages:  0.35,  0.28,  0.12; battery: 99%                                                                     up 0+00:04:39  02:58:55
111 processes: 2 running, 107 sleeping, 2 waiting
CPU:  0.4% user,  0.0% nice,  1.6% system,  0.0% interrupt, 98.0% idle
Mem: 1756M Active, 1181M Inact, 27M Laundry, 1820M Wired, 56K Buf, 11G Free
ARC: 1196M Total, 256M MFU, 918M MRU, 6016K Anon, 3689K Header, 8645K Other
     1096M Compressed, 1300M Uncompressed, 1.19:1 Ratio

Pretty big dent in memory use, but no where near what it was before the reboot.
Comment 8 Henrich Hartzer 2024-09-06 18:33:24 UTC
Been running for a week with no OOMs so far. I wanted to post the figures after things have settled for a while, in case OOMs come again soon.

171 processes: 2 running, 167 sleeping, 2 waiting
CPU:  2.3% user,  0.0% nice,  2.0% system,  0.5% interrupt, 95.3% idle
Mem: 2669M Active, 3577M Inact, 2452M Laundry, 4728M Wired, 104K Buf, 366M Free
ARC: 3198M Total, 508M MFU, 2385M MRU, 34M Anon, 19M Header, 250M Other
     2609M Compressed, 4558M Uncompressed, 1.75:1 Ratio
Comment 9 Henrich Hartzer 2024-09-30 19:09:05 UTC
I currently "reinstalled" (lots of rsync involved) on UFS with a different but similar machine. Only 8GB of memory.

I haven't had it lock up as hard yet, but it's starting to quite quickly.

I seem to be running into that "vanishing memory" bug again.

176 processes: 2 running, 172 sleeping, 2 waiting
CPU:  1.6% user,  0.0% nice,  1.4% system,  0.0% interrupt, 97.1% idle
Mem: 28M Active, 76M Inact, 76M Laundry, 1464M Wired, 642M Buf, 369M Free
ARC: (empty, of course)
Comment 10 Mark Millard 2024-09-30 19:35:21 UTC
(In reply to Henrich Hartzer from comment #9)

The fact that ARC showed up means that your environment
still has zfs.ko loaded and operational, even if the
specific file system is UFS.

If you can, I suggest disabling the boot-sequence load
of zfs.ko .

If ARC shows empty you would be able to do a live:

# kldunload zfs.ko

to unload it.

I suggest testing for the leakage after a reboot that
never loads zfs.ko at all.
Comment 11 Henrich Hartzer 2024-09-30 19:47:57 UTC
Good catch! I had rsync'ed /etc, which included zfs_enable="YES" in /etc/rc.conf. I commented that out.

I'll reboot at some point, but I kldunloaded zfs and look at the change!

171 processes: 2 running, 167 sleeping, 2 waiting
CPU:  2.7% user,  0.0% nice, 51.8% system,  4.7% interrupt, 40.8% idle
Mem: 1536M Active, 1590M Inact, 306M Laundry, 1371M Wired, 603M Buf, 221M Free
Comment 12 Mark Johnston freebsd_committer freebsd_triage 2024-10-01 08:35:39 UTC
I wonder if you could share a dmesg from the problematic system(s).  I'm particularly interested in the CPU model.
Comment 13 Henrich Hartzer 2024-10-01 14:26:23 UTC
Let me know if you need more of the dmesg.

I confirmed with my logs that I've had this on at least 13.2, 14.0, and 14.1.

This is all the same hardware with ZFS.

Apr 10 18:05:54 laptop kernel: FreeBSD 13.2-RELEASE releng/13.2-n254617-525ecfdad597 GENERIC amd64
Apr 10 18:05:54 laptop kernel: FreeBSD clang version 14.0.5 (https://github.com/llvm/llvm-project.git llvmorg-14.0.5-0-gc12386ae247c)
Apr 10 18:05:54 laptop kernel: VT(efifb): resolution 1366x768
Apr 10 18:05:54 laptop kernel: CPU: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz (2594.26-MHz K8-class CPU)
Apr 10 18:05:54 laptop kernel:   Origin="GenuineIntel"  Id=0x306a9  Family=0x6  Model=0x3a  Stepping=9
Apr 10 18:05:54 laptop kernel:   Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Apr 10 18:05:54 laptop kernel:   Features2=0x7fbae3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
Apr 10 18:05:54 laptop kernel:   AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
Apr 10 18:05:54 laptop kernel:   AMD Features2=0x1<LAHF>
Apr 10 18:05:54 laptop kernel:   Structured Extended Features=0x281<FSGSBASE,SMEP,ERMS>
Apr 10 18:05:54 laptop kernel:   Structured Extended Features3=0x9c000400<MD_CLEAR,IBPB,STIBP,L1DFL,SSBD>
Apr 10 18:05:54 laptop kernel:   XSAVE Features=0x1<XSAVEOPT>
Apr 10 18:05:54 laptop kernel:   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
Apr 10 18:05:54 laptop kernel:   TSC: P-state invariant, performance statistics
Apr 10 18:05:54 laptop kernel: real memory  = 17179869184 (16384 MB)
Apr 10 18:05:54 laptop kernel: avail memory = 16438591488 (15677 MB)
Apr 10 18:05:54 laptop kernel: Event timer "LAPIC" quality 600
Apr 10 18:05:54 laptop kernel: ACPI APIC Table: <LENOVO TP-G2   >
Apr 10 18:05:54 laptop kernel: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
Apr 10 18:05:54 laptop kernel: FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 hardware threads
...(Same boot)...
Apr 22 08:41:13 laptop kernel: pid 31866 (firefox), jid 0, uid 1002, was killed: a thread waited too long to allocate a page

Now on different hardware and on UFS.

Sep 29 22:21:38 laptop kernel: FreeBSD 14.1-RELEASE releng/14.1-n267679-10e31f0946d8 GENERIC amd64
Sep 29 22:21:38 laptop kernel: FreeBSD clang version 18.1.5 (https://github.com/llvm/llvm-project.git llvmorg-18.1.5-0-g617a15a9eac9)
Sep 29 22:21:38 laptop kernel: VT(vga): resolution 640x480
Sep 29 22:21:38 laptop kernel: CPU: Intel(R) Core(TM)2 Duo CPU     T9400  @ 2.53GHz (2527.07-MHz K8-class CPU)
Sep 29 22:21:38 laptop kernel:   Origin="GenuineIntel"  Id=0x10676  Family=0x6  Model=0x17  Stepping=6
Sep 29 22:21:38 laptop kernel:   Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Sep 29 22:21:38 laptop kernel:   Features2=0x8e3fd<SSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1>
Sep 29 22:21:38 laptop kernel:   AMD Features=0x20100800<SYSCALL,NX,LM>
Sep 29 22:21:38 laptop kernel:   AMD Features2=0x1<LAHF>
Sep 29 22:21:38 laptop kernel:   VT-x: HLT,PAUSE
Sep 29 22:21:38 laptop kernel:   TSC: P-state invariant, performance statistics
Sep 29 22:21:38 laptop kernel: real memory  = 8589934592 (8192 MB)
Sep 29 22:21:38 laptop kernel: avail memory = 8165998592 (7787 MB)
Sep 29 22:21:38 laptop kernel: Event timer "LAPIC" quality 100
Sep 29 22:21:38 laptop kernel: ACPI APIC Table: <LENOVO TP-7U   >
Sep 29 22:21:38 laptop kernel: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
Sep 29 22:21:38 laptop kernel: FreeBSD/SMP: 1 package(s) x 2 core(s)
Sep 29 22:21:38 laptop kernel: random: unblocking device.
Sep 29 22:21:38 laptop kernel: Firmware Warning (ACPI): 32/64X length mismatch in FADT/Pm1aControlBlock: 16/32 (20221020/tbfadt-748)
Sep 29 22:21:38 laptop kernel: Firmware Warning (ACPI): Invalid length for FADT/Pm1aControlBlock: 32, using default 16 (20221020/tbfadt-850)
Sep 29 22:21:38 laptop kernel: ioapic0: MADT APIC ID 1 != hw id 2
Sep 29 22:21:38 laptop kernel: ioapic0 <Version 2.0> irqs 0-23
Sep 29 22:21:38 laptop kernel: Launching APs: 1
Sep 29 22:21:38 laptop kernel: random: entropy device external interface

This one is actually a different error...

Sep 30 17:03:53 laptop kernel: pid 6374 (firefox), jid 0, uid 1002, was killed: failed to reclaim memory
Sep 30 18:34:31 laptop kernel: pid 1586 (firefox), jid 0, uid 1002, was killed: failed to reclaim memory
Sep 30 19:09:41 laptop kernel: pid 68263 (firefox), jid 0, uid 1002, was killed: failed to reclaim memory
Sep 30 19:13:00 laptop kernel: pid 68948 (firefox), jid 0, uid 1002, was killed: failed to reclaim memory
Sep 30 19:13:01 laptop kernel: pid 2031 (firefox), jid 0, uid 1003, was killed: failed to reclaim memory
Sep 30 19:13:03 laptop kernel: pid 99186 (firefox), jid 0, uid 1002, was killed: failed to reclaim memory

System didn't lock up nearly as bad, but clearly something was eating the memory.

Since I kldunloaded zfs, the memory hasn't vanished yet.
Comment 14 Mark Millard 2024-10-01 15:40:26 UTC
(In reply to Henrich Hartzer from comment #13)

As for memory not vanishing after the kldunload of zfs.ko . . .

But your earlier reported:

Mem: 1536M Active, 1590M Inact, 306M Laundry, 1371M Wired, 603M Buf, 221M Free

was still missing a lot of memory (skipping Buf):

The 1536+1590+306+1371+221 == 5024 was well less than the "7787 MB"
of "avail memory".

Rebooting such that zfs.ko is never loaded may well get nearer to
the 7787 MB for the total and stay that way. (Still could be change
categories going away from Free so that Free decreases over time.)

For "failed to reclaim memory" there is a tunable that can delay
the process kills: runs longer with the low Free RAM figure. There
are a couple of others that could well be appropriate to that
context.

However, for now, Mark J. likely is more interested in the decreasing
total that happened with zfs.ko loaded but was partially undone by the
kldunload. This is more unsual than just having the "failed to reclaim
memory" issue with FireFox in use.
Comment 15 Henrich Hartzer 2024-10-01 21:09:11 UTC
(In reply to Mark Millard from comment #14)

Understandable that the vanishing memory figures may be more interesting.

I had another lockup, but this time it would not come back. I do see a "was killed: failed to reclaim memory" in the logs.

I ended up rebooting, so now have a zfs-free boot to work with.

With my ZFS-free boot, here are the memory figures:

Mem: 2863M Active, 2949M Inact, 527M Laundry, 1206M Wired, 705M Buf, 266M Free

If I did the numbers right, that's 8,516MB which seems a little high...

Uptime is 2.5 hours at this point.
Comment 16 Henrich Hartzer 2024-10-01 22:00:21 UTC
It looks like my memory is continuing to vanish!

Mem: 275M Active, 605M Inact, 9192K Laundry, 1241M Wired, 664M Buf, 426M Free
Comment 17 Mark Millard 2024-10-02 01:59:51 UTC
(In reply to Henrich Hartzer from comment #15)

Do not include "Buf" in totals: That ends up double counting
some of what is already included in the other categories.
Pages that are part of a Buf could be Active or Inact, for
example.
Comment 18 Mark Millard 2024-10-02 02:09:05 UTC
(In reply to Henrich Hartzer from comment #15)

One thing that high memory use can do is to swap
out the kernel stacks of processes that allow you
to control/communicate with the system. At that
point they can not run at all until swapped back
in. This can be avoided via /etc/sysctl.conf
having:

#
# Together this pair avoids swapping out the process kernel stacks.
# This avoids processes for interacting with the system from being
# hung-up by such.
vm.swap_enabled=0
vm.swap_idle_enabled=0

(main [so: 15] now always does this and no longer
has those to control.)

This does not prevent OOM kills of such processes:
that is a separate issue.
Comment 19 Henrich Hartzer 2024-10-02 02:13:44 UTC
Ah, good to know about not counting buf. Thank you!

I started getting another lockup. Decided to track memory figures and kill some apps to see if it would make any difference.

Before killing Firefox:
Mem: 68M Active, 133M Inact, 916K Laundry, 1336M Wired, 774M Buf, 1397M Free

Total: 2,934

After killing Firefox:
Mem: 26M Active, 90M Inact, 244K Laundry, 1234M Wired, 772M Buf, 4651M Free

Total: 6,001

After killing everything notable:
Mem: 22M Active, 113M Inact, 56K Laundry, 1190M Wired, 772M Buf, 6159M Free

Total: 7,484


---

As far as swap goes, I've run with and without it and had these problems. I currently don't have any swap, so I imagine setting those sysctls would have no effect?
Comment 20 Mark Millard 2024-10-02 02:21:38 UTC
(In reply to Henrich Hartzer from comment #16)

Having reproduced the vanishing RAM totals problem
without zfs.ko ever being loaded, markj may prefer
tests of the UFS context because it is likely
a simpler context to investigate. (But I'm guessing.)
Comment 21 Mark Millard 2024-10-02 02:26:52 UTC
(In reply to Henrich Hartzer from comment #19)

For the swap related settings that I suggested:

) The settings should not hurt when there is no active swap space.

) If you reenable having an active SWAP SPACE sometimes, the
settings might help at those times --without having to remember to
adjust the settings.
Comment 22 Mark Millard 2024-10-02 02:34:17 UTC
(In reply to Henrich Hartzer from comment #19)

One thing that might provide useful information for
such tests is to sample the memory 2 or more times
after a kill, with some time between the samples,
the first being just after the kill. This might
expose if the system is gradually making more RAM
visible vs. if things look stable afterwards.
Comment 23 Mark Johnston freebsd_committer freebsd_triage 2024-10-04 15:02:27 UTC
Created attachment 253995 [details]
proposed patch

With the hint that ZFS might be involved, I found a logic bug which introduces a race condition that could be responsible for this.

Are you able to test kernel patches?  If so, please try applying the attached patch.
Comment 24 Henrich Hartzer 2024-10-04 18:22:51 UTC
(In reply to Mark Johnston from comment #23)

Hi Mark,

I am able to test patches. I'm not currently using ZFS, although I might be able to boot up another system that is.

I did find that on UFS, having ZFS loaded and then unloading it helped. But I've had this same issue with just UFS, without ZFS loaded.

Although it seems like when I have the problem I can kill the Firefox processes and the memory comes back and is accounted for.

Let me know if I should still try out that patch.

Thank you!
Comment 25 Mark Johnston freebsd_committer freebsd_triage 2024-10-04 18:30:23 UTC
(In reply to Henrich Hartzer from comment #24)
The patch is not specific to ZFS, I believe the problem can occur if you are using just UFS (though I suspect that ZFS perhaps triggers the problem more readily).  Since this problem is not easy to reproduce (I've seen one or two other reports of similar problems, but have never seen this on my desktop where I use firefox all the time), I'd greatly appreciate any testing you can do.
Comment 26 Henrich Hartzer 2024-10-04 18:41:30 UTC
(In reply to Mark Johnston from comment #25)

I understand. Thank you!

I'll give it a try and report back.
Comment 27 Henrich Hartzer 2024-10-05 15:01:07 UTC
Created attachment 254020 [details]
Proposed patch modified for 14.1-RELEASE

Proposed patch altered to apply on 14.1-RELEASE.
Comment 28 Henrich Hartzer 2024-10-05 15:01:47 UTC
I'm now running with the patch, but I had to make an adjustment for it to apply to 14.1-RELEASE.

Will let you know how it goes. Thank you again!
Comment 29 Mark Millard 2024-10-05 15:54:42 UTC
(In reply to Mark Johnston from comment #23)

When I looked at main and releng/14.1 source I did not see
the atomic_load_int use that is in your patch (without any
-/+ related to the line):

 	old = atomic_load_int(&m->ref_count);

I saw just:

 	old = &m->ref_count;

including when I look via:

https://cgit.freebsd.org/src/blame/sys/vm/vm_page.c#n4275
or:
https://cgit.freebsd.org/src/blame/sys/vm/vm_page.c?h=releng/14.1#n4090

So you may have other changes that should be involved
that your patch does not apply.
Comment 30 Mark Johnston freebsd_committer freebsd_triage 2024-10-05 15:58:03 UTC
(In reply to Mark Millard from comment #29)
Ah, sorry, I thought I had verified that the patch applies cleanly, but I might have reordered them after that check.  Indeed, I have another patch which adds some uses of atomic_load_int(), but I don't believe they are necessary for testing the patch, so I didn't include them.  The patch against 14.1 that Henrich posted looks ok to me in any case.
Comment 31 Henrich Hartzer 2024-10-07 14:31:54 UTC
Thought I would post an update with close to two days of uptime.

So far no OOMs and things have been stable, so I can't find any issues with the patch.

However, it does look like something is still "eating" memory according to top.

Mem: 1026M Active, 1085M Inact, 472M Laundry, 1290M Wired, 768M Buf, 373M Free

Total: 4,246MB~, should be close to 8GB.
Comment 32 Mark Johnston freebsd_committer freebsd_triage 2024-10-07 14:34:58 UTC
(In reply to Henrich Hartzer from comment #31)
Ok, so my patch didn't help.  I was hoping it would fix that leak.

I wonder if you could share output from "vmstat -o" while the system is in that state, i.e., around half of RAM has "disappeared".  Note that the output will contain path names, so might be a bit sensitive.  If you prefer, please feel free to mail me the file directly.
Comment 33 Henrich Hartzer 2024-10-08 17:27:58 UTC
Thank you for your help! I emailed it to you.

Let me know if there's anything else I can provide.
Comment 34 Mark Johnston freebsd_committer freebsd_triage 2024-10-13 15:33:47 UTC
I missed that you don't have swap configured.  I think I know why the pages are disappearing, then: there is some missing accounting.  However, the missing pages are not leaked per se - they are dirty, swap-backed pages that cannot be reclaimed because there is no swap device.  I will fix the accounting, but I suspect that having a couple GB or more of swap space available will improve the stability of your desktop.
Comment 35 Mark Millard 2024-10-13 15:50:09 UTC
(In reply to Mark Johnston from comment #34)

I think that you missed the following text:

QUOTE
As far as swap goes, I've run with and without it and had these problems. I currently don't have any swap, so I imagine setting those sysctls would have no effect?
END QUOTE

You might have to ask if decreasing memory totals are part of "had these problems" when swap is in use vs. if that is just the eventual OOM's/hang-ups.
Comment 36 Mark Millard 2024-10-13 16:32:03 UTC
(In reply to Mark Millard from comment #18)

By the way, if you add having swap space:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=206048#c7

reports:

QUOTE
On 2017-Feb-13, at 7:20 PM, Konstantin Belousov <kostikbel at gmail.com> wrote
on the freebsd-arm list:

. . .

swapfile write requires the write request to come through the filesystem
write path, which might require the filesystem to allocate more memory
and read some data. E.g. it is known that any ZFS write request
allocates memory, and that write request on large UFS file might require
allocating and reading an indirect block buffer to find the block number
of the written block, if the indirect block was not yet read.

As result, swapfile swapping is more prone to the trivial and unavoidable
deadlocks where the pagedaemon thread, which produces free memory, needs
more free memory to make a progress.  Swap write on the raw partition over
simple partitioning scheme directly over HBA are usually safe, while e.g.
zfs over geli over umass is the worst construction.
END QUOTE

Implication: Avoid file-based swap space in order to avoid deadlocks. Use partitions/slices instead.
Comment 37 Henrich Hartzer 2024-10-13 21:26:36 UTC
Thank you! I'm running with the new patch to vm_pageout's vm_pageout_scan_inactive(). Will that have any impact on my system without swap?

I've tested with ZFS with swap and without swap, as configured by the installer. All with geli, though. My setup with UFS has geli as well. Unfortunately, I was never tracking memory use to see if this accounting problem was present then.

But, it's interesting that while running on UFS I had significant memory "locked" merely by having ZFS loaded, that was unaccounted for until I removed the module.

Might it be part of the problem that I'm using geli, whether with UFS, ZFS, swap, or no swap? I haven't tried geli with UFS and swap, or just UFS, or just UFS+swap (no geli.)

I don't have a way to add swap (other than file based) at the moment. I'd probably want to setup a separate system for testing with at that point.

I wonder if Firefox has a genuine memory leak or if it's something more FreeBSD specific.
Comment 38 Mark Johnston freebsd_committer freebsd_triage 2024-10-14 13:44:13 UTC
Created attachment 254221 [details]
accounting patch

The main symptom which made me suspect a kernel bug was the accounting problem in top.  Do you recall whether this problem occurred when swap was enabled?  If so, then we probably do indeed have a kernel bug.

In any case, I attached a patch which fixes the accounting problem that occurs with swap disabled.  It would be useful to test that on top of the existing patches.
Comment 39 Henrich Hartzer 2024-10-14 17:53:36 UTC
(In reply to Mark Johnston from comment #38)

I'm not certain about the accounting bug with swap, but the "a thread waited too long to allocate a page" behavior was happening with swap. The swap was setup in the default manner by the installer for a ZFS + geli + encrypted swap configuration.

I'll give that patch a try. I consolidated all patches into one that applies on 14.1. Hopefully it looks correct.
Comment 40 Henrich Hartzer 2024-10-14 17:54:21 UTC
Created attachment 254227 [details]
Three patches in one for 14.1-RELEASE
Comment 41 Mark Millard 2024-10-17 00:47:21 UTC
(In reply to Mark Johnston from comment #38)

Once this patch is in main, it may be worth trying to reproduce
some of the behavior in:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277389

to see if the accounting comes out any different for . . .

There were contexts were ARC shrunk hugely but Wired stayed
large --which is the aspect that I'm thinking of.
Comment 42 Henrich Hartzer 2024-10-18 20:23:06 UTC
I think the accounting bug may be fixed with these patches!

I opened a Google Street View tab in Firefox and really noticed it chew through the memory. But at least it appears to all be accounted for.

Mem: 10M Active, 27M Inact, 5964M Laundry, 1572M Wired, 774M Buf, 241M Free

And if you're curious, here's after closing the Google Street View tab in Firefox:

Mem: 27M Active, 95M Inact, 5185M Laundry, 1554M Wired, 774M Buf, 945M Free

That's a lot of laundry. I don't understand why there's so much.
Comment 43 Mark Millard 2024-10-18 20:55:26 UTC
(In reply to Henrich Hartzer from comment #42)

Mem: 10M Active, 27M Inact, 5964M Laundry, 1572M Wired, 774M Buf, 241M Free

10+27+5964+1572+241 == 7814 (ignoring Buf's double counting)

Mem: 27M Active, 95M Inact, 5185M Laundry, 1554M Wired, 774M Buf, 945M Free

27+95+5185+1554+945 == 7806 (ignoring Buf's double counting)

7814-7806 == 8 So: only a small relative delta on the "M" scale.

The displayed figures are approximations of what likely involve fractions
on the "M" scale. So requiring zero need not be reasonable and the details
are dependent on top's implementation.
Comment 44 Mark Millard 2024-10-21 16:31:11 UTC
(In reply to Mark Johnston from comment #38)

Is releng/14.2 likely to get the accounting improvements?
Comment 45 Mark Johnston freebsd_committer freebsd_triage 2024-10-21 21:01:11 UTC
(In reply to Mark Millard from comment #44)
Yes, I will commit the changes to main this week and MFC with a short timeout.

(In reply to Henrich Hartzer from comment #42)
Laundry pages are dirty memory that can only be reclaimed by saving a copy to some swap space.  firefox will generate quite a lot of it.  I'd hope that 8GB of RAM is enough to run firefox, but it's been a long time since I ran a desktop with less than 16/32GB.

On my desktop, firefox is consuming a "healthy" amount of memory.  It's not easy to see how memory is shared among different firefox processes, but its total memory usage is definitely less than the sum of the RES column:

  251 markj        37  23    0  9331M  2663M select   3 483:17  13.66% firefox
  274 markj        30  23    0  4743M   871M select  10 165:13   9.64% firefox
  245 markj        95  20    0    17G  3617M select   0  41.9H   1.94% firefox
66817 markj        34  20    0  8444M   784M select   1 113:04   1.76% firefox
86228 markj        29  20    0  2928M   542M select   0   0:09   1.35% firefox
86836 markj        30  20    0  3463M   722M select   9   1:31   1.00% firefox
  257 markj        32  20    0  3873M   864M select   7 150:33   0.64% firefox
88325 markj        31  20    0  2502M   178M select  11   0:00   0.52% firefox
  249 markj        29  20    0    23G  1525M select   2 173:56   0.36% firefox
  268 markj        32  21    0  5655M  1197M select   5 377:49   0.29% firefox
  273 markj        30  20    0  4656M   966M select   7 151:09   0.18% firefox
83201 markj        28  20    0  2634M   297M select   8   0:24   0.17% firefox
  256 markj        29  20    0  3444M   555M select   0  47:48   0.16% firefox
  262 markj        29  20    0  3599M   558M select   4  58:56   0.14% firefox
76703 markj        29  20    0  2651M   301M select   4   0:54   0.12% firefox
  252 markj        29  20    0  5898M  1023M select   8  40:35   0.12% firefox
87306 markj        29  20    0  2624M   274M select   9   0:18   0.11% firefox
10355 markj        27  20    0  2946M   384M select   8  12:39   0.09% firefox
  258 markj        29  20    0  2671M   247M select   9  39:37   0.08% firefox
  291 markj        38  20    0  4034M   766M select   9  97:15   0.04% firefox
  279 markj        31  20    0  4559M   659M select   7 109:57   0.03% firefox
  255 markj        29  20    0  3729M   658M select   0  44:00   0.03% firefox
59901 markj        28  20    0  2882M   469M select   2   6:20   0.01% firefox
  253 markj        32  20    0  4738M   931M select   5  44:53   0.01% firefox
98743 markj        27  20    0  3154M   509M select   4   3:28   0.00% firefox
86990 markj        29  20    0  2760M   388M select   8   0:08   0.00% firefox
  266 markj         5  20    0   869M   241M select   5  98:24   0.00% firefox
  261 markj        29  20    0  3021M   375M select   6  64:42   0.00% firefox
  259 markj         5  20    0   383M   137M select   1  32:56   0.00% firefox
  250 markj        28  20    0  4044M   783M select   1   9:10   0.00% firefox
  248 markj         6  20    0   267M   121M select   7   5:35   0.00% firefox
79393 markj        33  20    0  7544M  1803M select   4   5:23   0.00% firefox
10361 markj        28  20    0  2532M   181M select  10   3:11   0.00% firefox
71850 markj        27  20    0  2556M   212M select   5   0:06   0.00% firefox
71892 markj        28  20    0  2521M   192M select   6   0:05   0.00% firefox
88241 markj        31  20    0  2502M   180M select   8   0:00   0.00% firefox
88409 markj        19  21    0  2447M   172M select   8   0:00   0.00% firefox
88352 markj        19  20    0  2447M   172M select   9   0:00   0.00% firefox
88410 markj        19  21    0  2447M   172M select   2   0:00   0.00% firefox


That said, the memory usage is certainly substantial.

Userspace memory leaks are also certainly a possibility, though I haven't seen one in firefox in quite a while.
Comment 46 Mark Millard 2024-10-21 23:18:33 UTC
(In reply to Mark Johnston from comment #45)

Your plan for the to-main and then MFC to stable/14 in
time for 14.2: Thanks.

"Laundry pages are dirty memory that can only be reclaimed
by saving a copy to some swap space." I think that wording
is more specific than I've noticed elsewhere (limited
reclaim options).

[Process kill/exit releasing memory would likely not be
subject to needing to be saved to swap space.]

So is Inact basically a (potential) mix of clean and dirty
memory that can be reclaimed without saving a copy to some
swap space? (Clean Inact can likely just be freed to be
reclaimed but dirty Inact might be able to be put back in
Active, if that is considered a "reclaim".)
Comment 47 Mark Johnston freebsd_committer freebsd_triage 2024-10-22 12:47:31 UTC
(In reply to Mark Millard from comment #46)
My wording wasn't very good.  I was speculating about the contents of the laundry queue on Henrich's system specifically.  You're right that process exit can also reclaim pages (from the laundry queue), but here I'm assuming that many of the laundry queue pages there are owned by firefox.

The inactive queue contains a mix of clean and dirty pages.  The queue is scanned only when there is memory pressure.  "Recently referenced" pages are moved back to the active queue or requeued to the tail of the inactive queue.  Unreferenced pages found to be dirty during a scan are moved to the laundry queue, while clean pages are reclaimed on the spot.
Comment 48 commit-hook freebsd_committer freebsd_triage 2024-10-22 12:50:50 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=6a07e67fb7a8b5687a492d9d70a10651d5933ff5

commit 6a07e67fb7a8b5687a492d9d70a10651d5933ff5
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2024-10-22 12:48:43 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2024-10-22 12:48:43 +0000

    vm_meter: Fix laundry accounting

    Pages in PQ_UNSWAPPABLE should be considered part of the laundry.
    Otherwise, on systems with no swap, the total amount of memory visible
    to tools like top(1) decreases.

    It doesn't seem very useful to have a dedicated counter for unswappable
    pages, and updating applications accordingly would be painful, so just
    lump them in with laundry for now.

    PR:             280846
    Reviewed by:    bnovkov, kib
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D47216

 sys/vm/vm_meter.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)
Comment 49 Mark Millard 2024-10-22 17:00:24 UTC
(In reply to commit-hook from comment #48)

Previous(/current) descriptions :

vm.domain.0.stats.laundpdpgs: Laundry pages scanned by the page daemon
vm.domain.0.stats.laundry: laundry pages
vm.domain.0.stats.unswappable: Unswappable pages
vm.domain.0.stats.unswppdpgs: Unswappable pages scanned by the page daemon

More accurate ones for comparison/contrast (if I've understood right)? :

vm.domain.0.stats.laundpdpgs: Swappable Laundry pages scanned by the page daemon
vm.domain.0.stats.laundry: Swappable+Unswappable laundry pages
vm.domain.0.stats.unswappable: Unswappable pages
vm.domain.0.stats.unswppdpgs: Unswappable pages scanned by the page daemon

vm.domain.0.stats.laundpdpgs is actually unchanged but in
the new context the original description is highly
ambiguous via it not matching vm.domain.0.stats.laundry .

To estimate the old vm.domain.0.stats.laundry value
(just Swappable) requires subtraction in the new
context. Also:

vm.domain.0.stats.laundry+vm.domain.0.stats.unswappable

should not be used in the new context (avoiding double
counting of unswappable).

The proposed wording suggests those points.
Comment 50 Henrich Hartzer 2024-10-22 17:16:58 UTC
Thank you! That is interesting.

So the memory can be reclaimed if pushed out to swap, but I wonder if swap would fill up fairly quickly.

Running with all of the patches I had some more OOM behavior.

I had a fairly light Firefox (a dozen tabs, nothing very heavy), some other processes, and ran Gimp. I couldn't get very far with Gimp.

pid 1614 (firefox), jid 0, uid 1003, was killed: a thread waited too long to allocate a page
iwn0: RF switch: radio disabled
wlan0: link state changed to DOWN
pid 11298 (firefox), jid 0, uid 1002, was killed: a thread waited too long to allocate a page
pid 1532 (firefox), jid 0, uid 1002, was killed: failed to reclaim memory
pid 68014 (gimp-2.10), jid 0, uid 0, was killed: failed to reclaim memory

When the system starts to lock up, I kill the RF switch which usually helps it become more responsive. You can see the "a thread waited too long to allocate a page" error again.

With my main Firefox process dead I tried Gimp again. I could barely use it before it died again.

pid 69150 (gimp-2.10), jid 0, uid 0, was killed: failed to reclaim memory
pid 98331 (keepassxc), jid 0, uid 0, was killed: failed to reclaim memory
pid 1729 (telegram-desktop), jid 0, uid 1002, was killed: failed to reclaim memory

Is "failed to reclaim memory" from not being able to swap out the dirty pages? I'm surprised by how much memory Firefox and Gimp use.
Comment 51 Mark Millard 2024-10-22 17:19:33 UTC
(In reply to Mark Millard from comment #49)

Hmm. Making explicit that unswappable is laundry would
be more like (adding "laundry" to each of the pair):

vm.domain.0.stats.unswappable: Unswappable laundry pages
vm.domain.0.stats.unswppdpgs: Unswappable laundry pages scanned by the page daemon

And, avoiding capitalizing "laundry" after "Swappable":

vm.domain.0.stats.laundpdpgs: Swappable laundry pages scanned by the page daemon
Comment 52 Mark Millard 2024-10-22 17:34:14 UTC
(In reply to Henrich Hartzer from comment #50)

Were Active, Inact, Laundry, Wired, and Free such that
Laundry was huge in each case (and the others were not)?
Did the lead up to the failures have Laundry increasing
at a notable sustained rate vs. did it more suddenly
jump?

If you had spare USB media and a USB port, for example,
having a swap space that is say, something like 3.6*RAM
(so RAM+SWAP == 4.6*RAM) and seeing if the RAM+SWAP use
stabilized before running out of RAM+SWAP could be
interesting. If it stabilized, you might be able to see
how much RAM+SWAP your example-being-tested needs.
That can help for future planning.

(The 3.6 factor should avoid warnings when the swap is
added about potentially being mistuned.)

(This USB or analogous test need not be biased for the
performance you would like in normal operation.)

Now that you get reasonable/useful Active, Inact, Laundry,
Wired, and Free figures while monitoring in top, such
explorations should now be possible via use of top.
Comment 53 Mark Millard 2024-10-22 18:01:03 UTC
(In reply to Henrich Hartzer from comment #50)

"failed to reclaim memory" is from multiple attempts
to increase the free RAM to not be below a target
threshold. There is a parameter for controlling the
number of attempts before the related OOM kills start.
For example in /boot/loader.conf I have:

#
# Delay when persistent low free RAM leads to
# Out Of Memory killing of processes:
vm.pageout_oom_seq=120

(You might well want bigger than 120. The default
is 12 --last I knew anyway. You might want 1200 or
12000 for experimenting. I expect too large a
number might end up with overflow problems but
have not checked the details.)

The setting does not ever disable the "failed to
reclaim memory" OOM activity. It just makes it
take longer to happen. It is more useful for
spanning temporary heavy RAM use, such as can
happen during buildworld, for example. It is not a
fix for permanent heavy RAM use. But it can help
allow exploring a context by giving more time
before the OOM activity happens.
Comment 54 Mark Millard 2024-10-23 00:32:50 UTC
I've finally tested the RPi5 with video. I've installed official
builds of:

xorg-minimal-7.5.2_3           X.Org minimal distribution metaport
xf86-video-scfb-0.0.7_2        X.Org syscons display driver
lxqt-2.0.0_1                   Meta-port for the LXQt Desktop

firefox-131.0_1,2              Web browser based on the browser portion of Mozilla
gimp-2.10.38,2                 Meta-port for the Gimp

(Of course, lots more was also installed.)

Because there is a known crash-issue for firefox vs. aslr
on aarch64, I'm using:

# proccontrol -m aslr -s disable firefox

to run firefox.

The RPi5 has 8 GiBytes of RAM. I'm not activating swap.
My context does not have the accounting patches, so I'm
ignoring laundry but monitoring free via top. (My top
build is a personally patched variation.) The context
is UFS based, zfs.ko not loaded. main [so: 15].

Firefox with some https://www.homedepot.com tabs and a
gimp are running. A simple ssh session from another
computer is in use for monitoring from another room.

After everything was set up I saw the likes of 3571Mi
Free. But I'm leaving it idle, not using it.

In your context, is there a known way to get the OOM
problem with some programs running but with on
interactive use of the system once set up? Is there a
known time frame for such for seeing notable Free
decreases in your context?

After about 60 min: 3452Mi Free, so about 119 MiByte
drop in an hour but I've no clue if such a rate will
be sustained.
Comment 55 Mark Millard 2024-10-23 09:11:45 UTC
(In reply to Mark Millard from comment #54)

One of the 3 more active firefox processes has
989 MiByte resident and the other 2 are over
500 MiBytes each at this point:

 1526     0 root         29  29    0   3105Mi  733776Ki select   3  72:16   7.07% /usr/local/lib/firefox/firefox -contentproc {b91e352c-11e4-473a-bb58-eb4826e54ecf} 1512 8 tab
 1525     0 root         28  20    0   3373Mi     989Mi select   2   4:25   0.28% /usr/local/lib/firefox/firefox -contentproc {744240e1-aa23-48b1-bded-f62a1fa4696a} 1512 7 tab
 1524     0 root         29  20    0   3034Mi  668656Ki select   1   6:25   0.64% /usr/local/lib/firefox/firefox -contentproc {075167db-4164-41b8-b7df-4f99b4610ef9} 1512 6 tab

As for Free, it is down to: 2894Mi Free. Not yet small enough
to be much memory pressure, nothing (much?) yet unswappable:

991Mi Active, 3106Mi Inact, 194740Ki Laundry, 776320Ki Wired, . . ., 2891Mi Free,
1347Mi MaxObsActive, 776848Ki MaxObsWired, 2230Mi MaxObs(Act+Wir+Lndry)

[MaxObs(erved) figures are my additions to top.]

real memory  = 8569733120 (8172 MB)
avail memory = 8325320704 (7939 MB)

# sysctl vm.domain | grep stat | sort # 4K page counts
vm.domain.0.stats.active: 253516
vm.domain.0.stats.actpdpgs: 14202998
vm.domain.0.stats.free_count: 740954
vm.domain.0.stats.free_min: 12880
vm.domain.0.stats.free_reserved: 2713
vm.domain.0.stats.free_severe: 7796
vm.domain.0.stats.free_target: 43381
vm.domain.0.stats.inactive: 795139
vm.domain.0.stats.inactive_pps: 0
vm.domain.0.stats.inactive_target: 65071
vm.domain.0.stats.inactpdpgs: 0
vm.domain.0.stats.laundpdpgs: 0
vm.domain.0.stats.laundry: 48632
vm.domain.0.stats.unswappable: 0
vm.domain.0.stats.unswppdpgs: 0

So: Nothing yet unswappable.

(Remember: context predates the patching and its commit.)
Comment 56 Mark Millard 2024-10-23 12:49:12 UTC
(In reply to Mark Millard from comment #55)

Now there is unswappable:

# sysctl vm.domain | grep stat | sort
vm.domain.0.stats.active: 463905
vm.domain.0.stats.actpdpgs: 25127955
vm.domain.0.stats.free_count: 71099
vm.domain.0.stats.free_min: 12880
vm.domain.0.stats.free_reserved: 2713
vm.domain.0.stats.free_severe: 7796
vm.domain.0.stats.free_target: 43381
vm.domain.0.stats.inactive: 973704
vm.domain.0.stats.inactive_pps: 106004072
vm.domain.0.stats.inactive_target: 65071
vm.domain.0.stats.inactpdpgs: 6715012
vm.domain.0.stats.laundpdpgs: 25988
vm.domain.0.stats.laundry: 201452
vm.domain.0.stats.unswappable: 5488
vm.domain.0.stats.unswppdpgs: 0

so Laundry by itself is now inaccurate in this old context.

1796Mi Active, 3829Mi Inact, ??? Laundry, 1243Mi Wired, . . ., 273724Ki Free,
3761Mi MaxObsActive, 1281Mi MaxObsWired, ??? MaxObs(Act+Wir+Lndry)

7939 - 1796 - 3829 - 1243 - 267.3 == 803.7 estimate of Laundry (MiBytes)
vs.
(201452+5488)*4096/(1024*1024)    == 808.4 estimate of Laundry (MiBytes)

Those 3 more active firefox processes look like:

 1526     0 root         29  28    0   3103Mi  655960Ki select   2 100:17   7.30% /usr/local/lib/firefox/firefox -contentproc {b91e352c-11e4-473a-bb58-eb4826e54ecf} 1512 8 tab
 1525     0 root         28  20    0   3730Mi    1228Mi select   3   5:20   0.42% /usr/local/lib/firefox/firefox -contentproc {744240e1-aa23-48b1-bded-f62a1fa4696a} 1512 7 tab
 1524     0 root         29  20    0   3034Mi  530976Ki select   0   8:37   5.64% /usr/local/lib/firefox/firefox -contentproc {075167db-4164-41b8-b7df-4f99b4610ef9} 1512 6 tab
Comment 57 Mark Millard 2024-10-23 17:01:34 UTC
(In reply to Mark Millard from comment #49)
(In reply to Mark Millard from comment #51)

I missed the other place that a laundry figure is available:

vm.stats.vm.v_laundry_count: Pages eligible for laundering

The patch only seems to change one of the laundry counts.
That may or may not be deliberate, if true.

SO: it is possible that I previously misidentified the
association and have wordings based on the wrong places
being invovled or not in all the places.

The wording of the descriptions should make clear the
status for if Unswappable is included or not in each case.
("Eligible" might  be trying to imply the handling of
Unswappable --but I expect it is not effective if so.)

vm.stats.vm.* does not have anything analogous to
vm.domain.*.stats.unswappable (or a total of such)
as far as I can see.

(My test contexts have just one domain (0) so do not
show multi-domain distinctions well.)
Comment 58 Mark Johnston freebsd_committer freebsd_triage 2024-10-23 17:07:34 UTC
(In reply to Mark Millard from comment #57)
This is deliberate, more or less.  The vm.domain.* sysctls provide a detailed breakdown of memory usage and are relatively new (added when NUMA support was added to FreeBSD).  The vm.stat.* sysctls are much older and much more commonly used as a data source for memory usage summaries.  For the latter, I think a hack to hide the implementation detail of PQ_UNSWAPPABLE is ok.  In the longer term I'd like to come up with a better solution for the problem that PQ_UNSWAPPABLE solves, as it itself is a hack IMO.
Comment 59 Mark Millard 2024-10-23 17:49:36 UTC
(In reply to Mark Johnston from comment #58)

Okay.

The below is basically just to leave notes to correct
what I got wrong that could mislead folks that read
this bugzilla submittal's material. Little or none of
it might have a related commit.

Original descriptions
(using domain 0 as an example when involved) . . .

Domain have:

vm.domain.0.stats.laundpdpgs: Laundry pages scanned by the page daemon
vm.domain.0.stats.laundry: laundry pages
vm.domain.0.stats.unswappable: Unswappable pages
vm.domain.0.stats.unswppdpgs: Unswappable pages scanned by the page daemon

There is also the overall:

vm.stats.vm.v_laundry_count: Pages eligible for laundering


Potential updated descriptions conceptually related to the
patch that are more explicit about the context spanned by
each . . .

vm.domain.0.stats.laundpdpgs: Domain's swappable laundry pages scanned by the page daemon
vm.domain.0.stats.laundry: Domain's swappable laundry pages count
vm.domain.0.stats.unswappable: Domain's unswappable laundry pages count
vm.domain.0.stats.unswppdpgs: Domain's unswappable laundry pages scanned by the page daemon

vm.stats.vm.v_laundry_count: All swappable+unswappable Pages eligible for laundering
Comment 60 Mark Millard 2024-10-24 05:01:23 UTC
(In reply to Mark Millard from comment #59)

# sysctl vm.domain | grep stat | sort
vm.domain.0.stats.active: 401978
vm.domain.0.stats.actpdpgs: 62005476
vm.domain.0.stats.free_count: 43575
vm.domain.0.stats.free_min: 12880
vm.domain.0.stats.free_reserved: 2713
vm.domain.0.stats.free_severe: 7796
vm.domain.0.stats.free_target: 43381
vm.domain.0.stats.inactive: 1022754
vm.domain.0.stats.inactive_pps: 136138740
vm.domain.0.stats.inactive_target: 65071
vm.domain.0.stats.inactpdpgs: 9985067
vm.domain.0.stats.laundpdpgs: 32733
vm.domain.0.stats.laundry: 240061
vm.domain.0.stats.unswappable: 6536
vm.domain.0.stats.unswppdpgs: 0

So vm.domain.0.stats.free_count and vm.domain.0.stats.free_target
are now approximately equal. It seems to increase
vm.domain.0.stats.unswappable more now.

Also note vm.domain.0.stats.inactive is about 975 KiPages.
Presuming that clean pages are being reclaimed, much of this
(and all of vm.domain.0.stats.laundry and
vm.domain.0.stats.unswappable) are dirty pages.

Whatever Home Depot's home pages does, it seems to
cause FireFox to dirty pages but not free them.
Sorted by decreasing RES (starting with 2093Mi),
the largest firefox processes are are:

 1525     0 root         28  20    0   4620Mi    2093Mi select   2  10:00   0.26% /usr/local/lib/firefox/firefox -contentproc {744240e1-aa23-48b1-bded-f62a1fa4696a} 1512 7 tab
 1512     0 root         78  20    0   4385Mi    1373Mi select   3  71:51   1.30% firefox
 1526     0 root         28  28    0   3175Mi  627328Ki CPU1     1 218:28   7.10% /usr/local/lib/firefox/firefox -contentproc {b91e352c-11e4-473a-bb58-eb4826e54ecf} 1512 8 tab
 1524     0 root         29  20    0   3039Mi  504836Ki select   0  18:09   0.28% /usr/local/lib/firefox/firefox -contentproc {075167db-4164-41b8-b7df-4f99b4610ef9} 1512 6 tab
 1520     0 root         21  20    0   2688Mi  337388Ki select   1   0:32   0.00% /usr/local/lib/firefox/firefox -contentproc {d3bc151f-c6c2-4e85-8d13-1f3eda86142f} 1512 2 tab

They also tend to be the bigger CPU users.

For reference:

1579Mi Active, 3983Mi Inact, ??? Laundry, 1241Mi Wired, . . ., 174768Ki Free,
3761Mi MaxObsActive, 1281Mi MaxObsWired, ??? MaxObs(Act+Wir+Lndry)

As far as Inact + (swappable laundry+unswappable laundry)
MiByte estimates go:

3983 + (240061+6536)*4096/1024/1024 is somewhat over 4946.2
at this point

(Based on expecting Inact to be almost all dirty pages in
this context.)
Comment 61 Mark Millard 2024-10-26 01:17:37 UTC
(In reply to Mark Millard from comment #60)

It got to the failure point:

vm_fault_allocate_oom: proc 1526 (firefox) failed to alloc page on fault, starting OOM
vm_fault_allocate_oom: proc 1526 (firefox) failed to alloc page on fault, starting OOM
vm_fault_allocate_oom: proc 1526 (firefox) failed to alloc page on fault, starting OOM
vm_fault_allocate_oom: proc 1512 (firefox) failed to alloc page on fault, starting OOM
Oct 25 13:43:07 aarch64-main-pkgs kernel: pid 1525 (firefox), jid 0, uid 0, was killed: a thread waited too long to allocate a page

That was about 4 hrs ago as I type this. Now:

Mem: 122832Ki Active, 438552Ki Inact, . . ., 1240Mi Wired, . . ., 4218Mi Free, 4842Mi MaxObsActive, 1304Mi MaxObsWired, 6815Mi MaxObs(Act+Wir+Lndry)

The largest RES processes are:

(Note: 1512 was not one of the processes killed. 1524 and 1526
are still running as well. 1524, 1525, and 1526 look to have been
for the 3 Home Depot tabs, 1525 having been killed.)

 1512     0 root         77  20    0   4397Mi    1062Mi select   0 170:55   2.19% firefox
 1526     0 root         29  28    0   3218Mi  399560Ki select   0 543:37   8.57% /usr/local/lib/firefox/firefox -contentproc {b91e352c-11e4-473a-bb58-eb4826e54ecf} 1512 8 tab
 1520     0 root         21  20    0   2836Mi  375868Ki select   0   1:23   0.00% /usr/local/lib/firefox/firefox -contentproc {d3bc151f-c6c2-4e85-8d13-1f3eda86142f} 1512 2 tab
 1524     0 root         29  20    0   3057Mi  320676Ki select   0  44:52   1.18% /usr/local/lib/firefox/firefox -contentproc {075167db-4164-41b8-b7df-4f99b4610ef9} 1512 6 tab

# sysctl vm.domain | grep stat | sort
you have mail
vm.domain.0.stats.active: 31862
vm.domain.0.stats.actpdpgs: 1904670576
vm.domain.0.stats.free_count: 1078502
vm.domain.0.stats.free_min: 12880
vm.domain.0.stats.free_reserved: 2713
vm.domain.0.stats.free_severe: 7796
vm.domain.0.stats.free_target: 43381
vm.domain.0.stats.inactive: 109856
vm.domain.0.stats.inactive_pps: 426537
vm.domain.0.stats.inactive_target: 65071
vm.domain.0.stats.inactpdpgs: 38305445
vm.domain.0.stats.laundpdpgs: 6964714
vm.domain.0.stats.laundry: 1082
vm.domain.0.stats.unswappable: 493803
vm.domain.0.stats.unswppdpgs: 0

Note the size of: vm.domain.0.stats.actpdpgs

# sysctl vm.stats.vm.v_laundry_count
vm.stats.vm.v_laundry_count: 1082

One of the 3 Home Depot tabs reports that the page crashed.
The other 2 are still operable in the firefox session. The
OOM did not leave behind a *.core file (as expected).

Sure looks to me like a modified-memory leak during the
home depot web-page-activity-handling (when the user is
idle on the system).

I've no clue what type of data was/is in the accumulated
leak pages, limiting what conclusions I can make.
Comment 62 commit-hook freebsd_committer freebsd_triage 2024-10-29 13:35:07 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=1d271ba05fceb1ef6246e8079bc19e3b6416a833

commit 1d271ba05fceb1ef6246e8079bc19e3b6416a833
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2024-10-22 12:48:43 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2024-10-29 13:04:25 +0000

    vm_meter: Fix laundry accounting

    Pages in PQ_UNSWAPPABLE should be considered part of the laundry.
    Otherwise, on systems with no swap, the total amount of memory visible
    to tools like top(1) decreases.

    It doesn't seem very useful to have a dedicated counter for unswappable
    pages, and updating applications accordingly would be painful, so just
    lump them in with laundry for now.

    PR:             280846
    Reviewed by:    bnovkov, kib
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D47216

    (cherry picked from commit 6a07e67fb7a8b5687a492d9d70a10651d5933ff5)

 sys/vm/vm_meter.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)
Comment 63 commit-hook freebsd_committer freebsd_triage 2024-10-29 14:21:11 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=df515b2e22c79d857189f3ad7389b546c3428868

commit df515b2e22c79d857189f3ad7389b546c3428868
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2024-10-22 12:48:43 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2024-10-29 13:34:45 +0000

    vm_meter: Fix laundry accounting

    Pages in PQ_UNSWAPPABLE should be considered part of the laundry.
    Otherwise, on systems with no swap, the total amount of memory visible
    to tools like top(1) decreases.

    It doesn't seem very useful to have a dedicated counter for unswappable
    pages, and updating applications accordingly would be painful, so just
    lump them in with laundry for now.

    PR:             280846
    Reviewed by:    bnovkov, kib
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D47216

    (cherry picked from commit 6a07e67fb7a8b5687a492d9d70a10651d5933ff5)

 sys/vm/vm_meter.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)
Comment 64 Mark Millard 2024-10-29 14:30:00 UTC
(In reply to commit-hook from comment #48)

An example from my main [so: 15] context on an aarch64
(idle time after a busy time doing buildworld and the
like):

"vmstat -o | sort -nr -k4,4" output has 214555 lines showing 1000+ :

16088     0 16088  6421   0   0 WB  vn /usr/local/llvm19/lib/libMLIR.so.19.1
. . .
    0     0     0  1013   0   0 WB  vn /usr/local/include/boost/mpl/vector/aux_/preprocessed/typeof_based/vector20_c.hpp
. . .

top shows (I've tailored the output format from the defaults):

25684Ki Laundry

Checking for 6[0-9][0-9][0-9] in LAUND instead : 111341 lines

It seems that the LAUND column may be multiple counting the same
pages across some of the lines, totaling to far more than the
25684Ki .

It might be worth a note in the man page about how to interpret
the column.
Comment 65 Mark Millard 2024-10-29 17:26:25 UTC
(In reply to Mark Millard from comment #64)

Just FYI for what the LAUND column totals to for that context:

# vmstat -o | tail +1 | awk '{ sum += $4 } END { print sum }'
1168091355

And also:

# vmstat -o | tail +1 | wc -l
  215023

1168091355/215023 is slightly over 5432.4
Comment 66 Henrich Hartzer 2024-10-29 19:27:12 UTC
I'm glad that you've been able to reproduce this. It's nice to be able to see the usage in top as well.

I'd like to share a few thoughts/anecdotes on this.

When I killed Telegram Desktop my laundry dropped about 400MB.

In Firefox I cleared cache and it cleared a couple hundred MB from laundry. I need to test that some more to get better figures.

I'm wondering if this is Firefox specific, or if there's some shared component that is causing this. This may be far fetched but I am wondering if there is a Linux, maybe glibc behavior where you can mark a page as no longer needed but useful if it can fit, that isn't translating to FreeBSD. I could see some page/cache data that would be nice to hold on to if possible, but if there's any memory contention could be readily be jettisoned. Perhaps those are our laundry pages? Just speculation from someone who doesn't know a lot about how allocation works.

After running on UFS for a while, I feel like the OOM behavior is much more predictable and faster than with ZFS. Under ZFS I was getting hard locks for a while. With UFS, it seems like it'll usually resolve itself (killing something) in around 10 seconds). But with ZFS it could lock for minutes with this OOM condition. I'm not sure why that is, but it's nice that when I run into the bug it resolves pretty quickly if I'm not able to "catch" it in time.
Comment 67 Henrich Hartzer 2024-10-29 19:30:44 UTC
I just launched Gimp. I made a new image with the "US Letter" size.

I opened as layers four pictures in the 2-3MB range.

This seemed to create some laundry. I didn't let it idle or anything. I killed it and the laundry difference was 291MB.

Not sure why laundry would be created so quickly without any sort of "delete" or idle activity.
Comment 68 Mark Millard 2024-10-29 21:13:32 UTC
(In reply to Henrich Hartzer from comment #67)

Laundry pages are previously active pages that were
modified (not a copy of what is sorted on media) and
have been prepared for potentially writing out to
swap space. (Swap space can be added at any time so
being prepared does not require that any Swap space
be in a ce use yet.) If they were deleted without
being written to swap space, data/information would
be lost. Unless the program requests such a deletion,
this loss would be a major error.

Note that there can be modified pages that are Inactive
and have not yet been turned into laundry pages.

Memory pressure tends to free pages that were not modified
but are Inactive. This is because there is a place to go
back to in order to make a copy the original content:
the free activity does not destroy the information but
makes memory available for other uses.

I'll note that memory "pressure" created by one process
can cause another process(s) to end up with more laundry
or even to have some of its laundry pages paged out to
swap space. Having the laundry figure change at the system
level does not directly indicate which process(s) had some
pages reclassified. Avoid assuming that the process you
are using is the one that has its laundry status change
when the system figure changes.

For all I know, when memory "pressure" decreases, various
processes might have their laundry contribution decrease
as well (some going back to Inactive?).

One thing that does move things out of the laundry
category is that page being put to active use again:
Back to Active.

One process that always stays runnable and keeps enough
RAM in active use to prevent meeting the threshold for
free RAM is enough to lead to OOM activity. For such a
context, that always runnable process might not be one
of the processes eventually killed and its pages will
not become laundry (or even Inactive).