Bug 261181 - 13-STABLE hang: swap_pager: indefinite wait buffer
Summary: 13-STABLE hang: swap_pager: indefinite wait buffer
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: needs-qa
Depends on:
Blocks:
 
Reported: 2022-01-13 15:39 UTC by ldoujin
Modified: 2023-02-23 23:12 UTC (History)
8 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description ldoujin 2022-01-13 15:39:44 UTC
I have a 13.0-STABLE amd64 system that hangs when doing a "git pull" after several minutes of disk activity.

On a fresh boot with the latest kernel and userland, it gets as far as:

remote: Enumerating objects: 1314, done.
[...]
Updating 6f8a947161..562a8eaddf

This is where it freezes and the disk churning stops.

The console partially freezes up too. I'm able to switch ttys with alt+f2/3 but can't actually type a username at the login prompt.

After some time, the console then (repeatedly, but slowly) prints:

swap_pager: indefinite wait buffer: bufobj: 0, blkno xxx, size: 4096

(The xxx is a random number every time, with some repetitions.)

I have to hard reset it. I tried with swap off but that didn't help. Of note is that wired memory in top at the time of the hang goes way up to 9gb (out of 10gb total) but the ARC is only around 250mb total. This is all done on a fresh boot.

Kernel stable/13-889517034 (from September) doesn't have the bug.
Comment 1 ldoujin 2022-01-13 18:35:44 UTC
$ git log --oneline --since "SEP 16 2021" --until "JAN 13 2022" sys | grep swap

1791debf4 swapoff: add one more variant of the syscall
45786883b swapoff(2): add a SWAPOFF_FORCE flag
6ceede7d3 swapoff(2): replace special device name argument with a structure
dea036bd1 swap_pager.c: Remove MPSAFE and ARGSUSED annotations
aebdfa951 Expand comment explaining reasons for automatic swapoff on shutdown
c1abd6bd3 shutdown: unmount filesystems after swapoff
08d995ca8 swapoff_one(): only check free pages count manually turning swap off
3a98b98be swap_pager: lock vnode in swapdev_strategy()
4b2caeec4 swapon: extend the region where the swap vnode is locked
81c9a051e swap pager: lock vnode around VOP_CLOSE()
5ac0e08ef vm.objects_swap: disable reporting some information
c54be5cfc Add vm.swap_objects sysctl
ca85fb7e0 swap_pager: Handle large swap_pager_reserve() requests

Any of these worth investigating?
Comment 2 tech-lists 2022-01-13 21:43:15 UTC
hi,

I'm running stable/13-n248899 and am not seeing the errors you're describing.

The error:

> swap_pager: indefinite wait buffer: bufobj: 0, blkno xxx, size: 4096

you're seeing is one i'd normally associate with hardware failing. Like it can't read or write to a cluster. The next steps I'd take would be to install smartmontools and run a long test on the disk. Also examine smartctl -x /dev/disk and look for errors/remapped clusters.
Comment 3 ldoujin 2022-01-13 21:53:08 UTC
(In reply to tech-lists from comment #2)
Thanks for your reply. The issue does not exist with a kernel from a few months back as I wrote, so I don't think it's a hardware problem. The wired memory filling up is also indicative of a kernel bug to me.
Comment 4 tech-lists 2022-01-14 00:20:29 UTC
(In reply to ldoujin from comment #3)

What's the actual version of your kernel/userland?

Also, is the swap on a partition or is it a swapfile, and is it on ssd or spinning rust?

Is the swap encrypted?
Comment 5 ldoujin 2022-01-14 00:25:59 UTC
(In reply to tech-lists from comment #4)

The affected kernel is 13.0-STABLE as of when this bug was filed. Userland is kept in sync. "git log" shows 972796d007c2bda481f50cf99d5531d5754ef2fa as the most recent commit.

Swap is mirrored on HDD partitions and encrypted: /dev/mirror/swap.eli

But as I said before, disabling swap does not fix it.
Comment 6 tech-lists 2022-01-14 00:34:03 UTC
(In reply to ldoujin from comment #5)

You said:

> But as I said before, disabling swap does not fix it.

Presumably you're not getting:

> swap_pager: indefinite wait buffer: bufobj: 0, blkno xxx, size: 4096

after you've turned off or disabled swap?

If you're no longer getting those errors, are you seeing anything in any of the other logs? /var/log/messages? /var/log/console.log (if that's enabled)
Comment 7 ldoujin 2022-01-14 23:18:20 UTC
(In reply to tech-lists from comment #6)

I did a fresh boot into the affected kernel version. The initial amount of wired memory in top is 213MB. When doing the disk-intensive portion of updating the ports tree, wired memory (but not the ARC) shoots up at a rate of about 250MB per second, finally hitting the limit and causing the hang.

With swap disabled, the swap_pager line is not printed, but the system is still frozen. I think the swap line is a red herring: it's only being printed because the system is out of usable memory at the time due to this leak.

I have console.log enabled and can't see anything relevant in either it or /var/log/messages.
Comment 8 tech-lists 2022-01-15 02:40:48 UTC
(In reply to ldoujin from comment #7)

(I thought I'd mention in passing that the initial issue should have been raised on either the mailing lists or the forums, then if a bug is discovered or no resolution there then it goes here in bugzilla, but we're here now, so)

I think the wired memory issue is a symptom with many potential causes and you need to find the underlying problem. I don't think it's an OS problem (yet) because firstly I'm running recent stable/13 on a number of servers and am not seeing the issue and secondly the mailing lists would be very busy indeed if it was a kernel problem concerning memory exhaustion.

When you're running 'git pull', from where is it pulling? 
Have you tried cloning to a different location?

I'd expect that if git had trouble writing to disk, that there'd be an error message somewhere but maybe it never gets to that point before the whole system freezes. If you can clone to a different location, then you've found the problem.

The git that you're running, is it built from ports or is it a binary install from freebsd's pkg servers?

What kernel are you using/is it modified? What does uname -a show?
If the kernel is modified, is the kernel config file (the one with the working kernel) the same as the one that built the latest kernel?

What sysctls do you have set? Have you tried running systat immediately after booting but just before running git? Have a look at man(1) systat for all the options. Have you tried running git with truss(1) ? Are you using zfs with nfs? Where is git meant to write to and how is it mounted?
Comment 9 ldoujin 2022-01-15 03:02:49 UTC
(In reply to tech-lists from comment #8)

git pull is pulling from ssh://git.freebsd.org, as I have used since its introduction. I have not tried pulling from any other source. It does not appear to have any problem writing to the disk; it just stops (along with everything else) because the RAM is all eaten up in a matter of about 129 seconds.

Issuing the same git command on the kernel from September results in a normally updated tree with no large increase in memory use. There are no write error messages on the console, in dmesg, or in "zpool status." Subsequent pulls on the old kernel are also stable, as is the whole system, for at least a month of normal use. With the older kernel, the disks are also used frequently with large transfers and there are no hangs.

git is built from ports. The kernel is custom with some things taken out, but I have not changed the kernel config since the day of 13.0's release, so it's the same in both cases.

sysctl.conf is mostly net.inet.icp.* things, along with these:

kern.elf64.allow_wx=0
kern.elf64.aslr.enable=1
kern.elf64.aslr.honor_sbrk=0
kern.elf64.aslr.pie_enable=1
kern.ipc.maxsockbuf=16777216
kern.ipc.shm_use_phys=1
security.bsd.stack_guard_page=1

I have not used systat or truss.

I am using ZFS, but this directory is not served over NFS. git is putting the files in /usr/local/poudriere/ports/default, which is part of the same geli-encrypted RAIDZ as the rest of the OS.
Comment 10 tech-lists 2022-01-15 03:18:14 UTC
(In reply to ldoujin from comment #9)

What happens if you do this:

git clone https://git.freebsd.org/ports.git /tmp/testgit

Does it complete?
Comment 11 ldoujin 2022-01-15 04:43:50 UTC
(In reply to tech-lists from comment #10)

It completes and wired memory stays at a much lower level.
Comment 12 tech-lists 2022-01-15 08:26:44 UTC
(In reply to ldoujin from comment #11)

what I'd do then is to delete and re-create the ports tree with poudriere ports -d -p default then poudriere ports -c -p default -m git+https

You've proved it's some git interaction with the state of the ports tree or the filesystem, maybe both. 

Is /tmp also on geli encrypted raidz?
Comment 13 Andriy Gapon freebsd_committer freebsd_triage 2022-01-15 11:38:49 UTC
(In reply to ldoujin from comment #7)
You can use vmstat -z to try to see where the wired memory is used.
Comment 14 ldoujin 2022-01-15 16:56:10 UTC
(In reply to tech-lists from comment #12)

/tmp is on the same geli-encrypted array as the rest of the OS.

A few more data points:
- Cloning new tree from https = no bug
- Cloning new tree from ssh = no bug
- Updating tree from ssh = bug

The ports directory is mounted with zstd compression, while /tmp is mounted with no compression. I tried switching the ports directory to lz4 and no compression, but the bug still happened with git pull in both cases.

As I understand it, ZFS may still actually be using the previous compression for reading existing files, so "disabling" it might not have fully disabled it. Either way, the compression is probably worth mentioning.

I don't want to delete the problematic tree yet because then the bug may not get found and fixed. Something is still broken that's provably not broken in a kernel from September.


(In reply to Andriy Gapon from comment #13)

I'm not familiar with vmstat, but here is its -z output on a fresh boot:

https://pastebin.com/raw/6xRwdmbw

And during the rapid wired memory inflation:

https://pastebin.com/raw/x0KiiP2A
Comment 15 ldoujin 2022-03-19 18:23:15 UTC
Bug is still present in 13-STABLE as of today.
Comment 16 Jim Pirzyk freebsd_committer freebsd_triage 2023-02-23 16:37:10 UTC
So I am seeing this also on a Virtual Machine since I upgraded to 13.1 (from 12.3 I believe).  It happens mostly when the machine is under load for some reason (backups is a common culprit but not always).  I will note the machine is starved of memory (2GB) and is swapping most of the time, to an encrypted partition using GBE.  ZFS is not involved in swap (but is for the filesystem).

A hard reset brings the machine back to life for several hours to multiple days.  Very random.  Have not yet found steps to force it to happen.
Comment 17 Graham Perrin freebsd_committer freebsd_triage 2023-02-23 22:44:41 UTC
(In reply to Jim Pirzyk from comment #16)

> … a Virtual Machine …

VirtualBox, or something else?
Comment 18 Jim Pirzyk freebsd_committer freebsd_triage 2023-02-23 23:12:03 UTC
(In reply to Graham Perrin from comment #17)

QEMU run by rootbsd/NetActuate