Bug 246886 - [sendfile] Nginx + NFS or FUSE causes VM stall
Summary: [sendfile] Nginx + NFS or FUSE causes VM stall
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.1-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
Depends on:
Reported: 2020-05-31 02:32 UTC by Hiroshi Nishida
Modified: 2022-06-17 16:52 UTC (History)
12 users (show)

See Also:

Output of procstat -kka (24.98 KB, text/plain)
2020-06-05 14:52 UTC, Hiroshi Nishida
no flags Details
fuse_vnops.c with 12.0R's fuse_vnop_getpages (61.94 KB, text/plain)
2020-07-24 17:51 UTC, Hiroshi Nishida
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hiroshi Nishida 2020-05-31 02:32:21 UTC
I'm developing a distributed file system using FUSE on FreeBSD 12.1-RELEASE and STABLE. However, when Nginx is accessing the FUSE mounted filesystem, Nginx stalls with 'grbmaw' state.
grbmaw seems to be used by vm_page_busy_sleep() and it seems to never awake again for some reason.
My FUSE program is just waiting for the next command at fuse_session_loop_mt().

I have tested with 4 different kinds of hardware and get this problem with 3 of them. The biggest difference between those 3 and the remaining 1 is the NIC; the 3 are wired and the rest is wireless.
I guess this happens when Nginx is rushed to access or send the data.
Once it happens, I need to reboot the device but it does not always shut down.
It is easily reproducible.

I think the problem is fuse.ko related and would appreciate it if anybody could fix it.

Thank you.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2020-05-31 02:59:26 UTC
Since it involves fusefs, over to fs@.
Comment 2 Mark Johnston freebsd_committer 2020-06-01 16:16:10 UTC
Could you provide the stack of the thread stuck in grbmaw?  procstat -kk <pid of stuck process> will show it.

A reproducible would also be useful.
Comment 3 Hiroshi Nishida 2020-06-01 16:44:55 UTC
Thank you for your response.
Here is the stack:

  PID    TID COMM                TDNAME              KSTACK                       
 1190 100092 nginx               -                   mi_switch+0xd4 sleepq_wait+0x2c _sleep+0x253 vm_page_busy_sleep+0xaf vm_page_grab_pages+0x3f2 allocbuf+0x371 getblkx+0x5be breadn_flags+0x3d vfs_bio_getpages+0x33f fuse_vnop_getpages+0x46 VOP_GETPAGES_APV+0x7b vop_stdgetpages_async+0x49 VOP_GETPAGES_ASYNC_APV+0x7b vnode_pager_getpages_async+0x7d vn_sendfile+0xdc2 sendfile+0x12b amd64_syscall+0x387 fast_syscall_common+0x101 

Well, I wish I could open my source code but it's unfortunately commercial and closed.
But I may be able to write a similar and simpler program and test with it.
Comment 4 Hiroshi Nishida 2020-06-01 20:47:59 UTC
I added 

sendfile off;

to nginx.conf and tested again.
Interestingly, nginx never stops on all devices and seems to be running with no problem.

It looks like sendfile triggers the problem.
Comment 5 Alan Somers freebsd_committer 2020-06-01 20:50:38 UTC
(In reply to Hiroshi Nishida from comment #4)
Yep, and perhaps that's an adequate workaround for you.  But we should still fix the bug.  Have you had any success with reducing your test case?
Comment 6 Hiroshi Nishida 2020-06-01 21:51:07 UTC
It's gonna take for a while and doesn't guarantee to reproduce the same problem.
However, I'll try it.
Comment 7 Julien Cigar 2020-06-02 09:39:36 UTC
maybe same issue as bug #244713
Comment 8 Mark Johnston freebsd_committer 2020-06-02 13:28:01 UTC
(In reply to Julien Cigar from comment #7)
Yes, I suspect that it would be a better use of time to try testing on the latest stable/12.  A number of bugs in sendfile have been fixed in the past few months, most of which were only reproducible with !UFS.
Comment 9 Hiroshi Nishida 2020-06-02 14:05:16 UTC
(In reply to Mark Johnston from comment #8)

The problem is reproducible with 12-STABLE downloaded last week.
I'll start programing today for the filesystem and hope it reproduces the same problem.

Intuitively, it seems to happen when the filesystem is slow.
Comment 10 Mark Johnston freebsd_committer 2020-06-02 14:45:28 UTC
(In reply to Hiroshi Nishida from comment #9)
If it is convenient to test, I would also suggest trying the latest -CURRENT snapshots.  sendfile internals have diverged a fair bit between HEAD and stable/12.  If you are able to reproduce there, the output of "procstat -kka" would probably be sufficient to start investigating.  Otherwise, the same output from the stable/12 system would be helpful.
Comment 11 Hiroshi Nishida 2020-06-03 03:41:40 UTC
I have uploaded my test program to:

I could reproduce the problem very easily in my office but it took pretty long in my house.
It seems the network bandwidth is somehow related but I'm not sure.
If it is too hard to reproduce, I may write another one.

Please take a look at README for the installation, usage, etc.
You will need to keep seeking video until it freezes.
With my original program, it takes only seconds.

Unfortunately, I don't have any devices available for installing CURRENT right now.
I would appreciate it if somebody could test with CURRENT.
Comment 12 Alan Somers freebsd_committer 2020-06-04 01:14:43 UTC
The test case does not work for me.  When I try, I get the following error.  Maybe you need to add the video file to the git repo?

> sudo ./fusetest -d
DEBUG: FuseGetattr: path: /BigBuckBunny-Full-web.mp4
DEBUG: FuseOpen: path: /BigBuckBunny-Full-web.mp4, flags: 0
DEBUG: FuseRead: path: /BigBuckBunny-Full-web.mp4, fi->fh: 0x0, size: 65536, offset: 0
DEBUG: FuseRelease: path: /BigBuckBunny-Full-web.mp4, fi->fh: 0x0
DEBUG: FuseGetattr: path: /BigBuckBunny-Full-web.mp4
DEBUG: FuseOpen: path: /BigBuckBunny-Full-web.mp4, flags: 0
DEBUG: FuseRead: path: /BigBuckBunny-Full-web.mp4, fi->fh: 0x0, size: 65536, offset: 0
Error: FuseRead: curl_easy_perform: Timeout was reached
^C^C^C^CDEBUG: FuseRelease: path: /BigBuckBunny-Full-web.mp4, fi->fh: 0x0
Comment 13 Hiroshi Nishida 2020-06-04 02:23:09 UTC
(In reply to Alan Somers from comment #12)

That's maybe because rnci002.ddns.net is an IPv6 only server and its IPv4 address is dummy.

Could you check if your device running fusetest can access
http://rnci002.ddns.net/raw-videos/BigBuckBunny-Full-web.mp4 ?

BigBuckBunny-Full-web.mp4 can be downloaded from there.
It's too large to put at github.
Comment 14 Hiroshi Nishida 2020-06-04 02:29:21 UTC
Well I can put BigBuckBunny-Full-web.mp4 on a different server with an accessible IPv4 address.
Let me know if you need it.
Comment 15 Alan Somers freebsd_committer 2020-06-04 02:39:29 UTC
(In reply to Hiroshi Nishida from comment #14)
Please do.  I'm on an IPv4-only ISP.
Comment 16 Hiroshi Nishida 2020-06-04 03:28:04 UTC
Here you go.
Please update from https://github.com/scopedog/FreeBSD-FUSE-sendfile
Comment 17 Alan Somers freebsd_committer 2020-06-04 03:46:30 UTC
I can't reproduce it so far, on either 12.1-RELEASE or on 13-CURRENT.  What I'm doing is loading the video, pressing seek, and as soon as the next frame renders seeking again.  So far I've done that all the way through about five times with no hangs.
Comment 18 Hiroshi Nishida 2020-06-04 03:55:58 UTC
It took about 10 min in my house, so you shouldn't give up.
That said, please try with your LAN instead.
Put BigBuckBunny-Full-web.mp4 on a machine in your LAN and change

#define URL     "http://rnc02.asusa.net/raw/BigBuckBunny-Full-web.mp4"

in fusetest.h to the URL of the new one.
I used LAN in my office and the video froze very easily.

If you still cannot reproduce the problem, I'll think about using my original program that freezes the video super easily.
But I need a permission from my boss and it will be hard.
Comment 19 Hiroshi Nishida 2020-06-04 18:38:18 UTC
I have created another test program that reproduces the error more easily than the first one, at least in my environment.


If you had no luck with reproducing the error, please try it.
Just seek and seek, click and click even while the video is loading.
Now the video freezes in 10 sec in my house.

By the way, I run nginx and fusetest on mini PCs like Intel NUC.
If possible, run them on a slow PC.
Comment 20 Alan Somers freebsd_committer 2020-06-05 04:14:08 UTC
Still can't reproduce.  I don't have any slow FreeBSD computers, but I tried running a CPU-intensive benchmark in the background and it didn't help.
Comment 21 Hiroshi Nishida 2020-06-05 05:01:43 UTC
Okay, let me think what to do.
Can you debug remotely?
Or I can send one of my servers (maybe Intel NUC) to you but in either way, I need permission from my boss.
Comment 22 Mark Johnston freebsd_committer 2020-06-05 14:47:04 UTC
(In reply to Hiroshi Nishida from comment #21)
I don't mean to interrupt, but in the meantime it would help to see "procstat -kka" output from a system where the deadlock is occurring.  Presumably some other thread is holding the page busy, which is causing nginx to block forever.  procstat output would help identify that thread.
Comment 23 Hiroshi Nishida 2020-06-05 14:52:39 UTC
Created attachment 215250 [details]
Output of procstat -kka

Here you go.
Comment 24 Mark Johnston freebsd_committer 2020-06-05 15:16:47 UTC
(In reply to Hiroshi Nishida from comment #23)
I don't see any other blocked threads, which suggests that the busy lock is being leaked somewhere.

Do any of your sendfile calls result in read errors from fuse?  In other words, do you ever see sendfile_iodone() being called with error != 0?  It can be verified by running:

# dtrace -n 'fbt::sendfile_iodone:entry /args[3] != 0/{stack();}'

while running your test.
Comment 25 Hiroshi Nishida 2020-06-05 15:37:03 UTC
(In reply to Mark Johnston from comment #24)

My FUSE program does not use sendfile but my other programs like rncddsd, rncmond use it, so I stopped them.
However, I still get the same error and dtrace outputs nothing after

dtrace: description 'fbt::sendfile_iodone:entry ' matched 1 probe

the procstat -kka output seems almost same.
Comment 26 Hiroshi Nishida 2020-06-05 22:52:34 UTC
(In reply to Mark Johnston from comment #24)

I repeatedly ran one of my programs that used sendfile but it didn't return error and the data were all sent correctly, even after nginx froze.

dtrace didn't output anything, either.
Comment 27 Hiroshi Nishida 2020-06-10 15:05:19 UTC
I tested with CURRENT. 
It seems OK with CURRENT, nginx never freezes so far.
Comment 28 Alan Somers freebsd_committer 2020-06-10 15:24:37 UTC
(In reply to Hiroshi Nishida from comment #27)
Terrific news!  Would you be able to test on the latest stable/12 as well?
Comment 29 Hiroshi Nishida 2020-06-10 15:31:47 UTC
(In reply to Alan Somers from comment #28)
A new SSD will arrive on Saturday.
I will try then.
Comment 30 Hiroshi Nishida 2020-06-15 22:58:38 UTC
(In reply to Alan Somers from comment #28)

Unfortunately, the problem still persists with 12.1-STABLE r362026 dated 20200611.

By the way, I got permission from my boss to let other people log in to my device or lend it for development.
If you need it, let me know.
I've been using FreeBSD for 23 years and am willing to cooperate for the bug fix.
Comment 31 Hiroshi Nishida 2020-07-21 17:57:35 UTC
I finally had a chance to test with STABLE (20200611) + 12.0-R's fuse.
Everything looks OK and I never got the vm deadlock problem.

Considering also https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=247276, there seems to be something wrong with fuse in 12.1 or later.
Comment 32 Alan Somers freebsd_committer 2020-07-21 18:17:02 UTC
So it freezes on a recent stable/12 but works fine on head?  In that case, I'm suspecting a problem in the general sendfile code.  fusefs has very few differences between those branches, but sendfile has many.  Can you tell me the exact revision of head you tried?  There was a fix for a sendfile hang bug that went into head at r361852.
Comment 33 Hiroshi Nishida 2020-07-21 18:27:00 UTC
(In reply to Alan Somers from comment #32)
I need to swap my SSD to check the revision of head and cannot do that right now.

However as reported in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=247276, the same problem occurs with fusefs only.
It says the problem happens when mmap is used.
More interestingly, the behavior of fusefs is quite different between 12.0 and 12.1.
It may be worth checking fusefs first, though I'm not 100% sure.

I'll let you know the revision of head as soon as possible.
Comment 34 Hiroshi Nishida 2020-07-22 14:21:50 UTC
(In reply to Alan Somers from comment #32)

The revision of HEAD I tested is r361779.
Comment 35 Hiroshi Nishida 2020-07-23 21:48:54 UTC
The following is the most noticeable difference between 12.0R and 12-STABLE fusefs. Could you tell me what part of the source code cause it?

With nginx sendfile on,
12.0R: The size of data read is pretty random like 16384, 36864, 53248, 49152...
12-STABLE: The data are read every 64k.

With nginx sendfile off,
Both 12.0R and 12-STABLE read data every 32k.

Thank you.
Comment 36 Alan Somers freebsd_committer 2020-07-24 15:33:01 UTC
The FUSE code is very different in 12.0 vs 12.1.  I rewrote about half of the driver.  The changed fuse read sizes could be due to changes in fuse_vnop_getpages, or changes anywhere in fuse_io.c.  But the changes were very comprehensive.

I'm still interested in the fact that you can't reproduce on a recent head.  That opens up a possibility for bisecting the problem.
Comment 37 Hiroshi Nishida 2020-07-24 17:51:30 UTC
Created attachment 216748 [details]
fuse_vnops.c with 12.0R's fuse_vnop_getpages
Comment 38 Hiroshi Nishida 2020-07-24 17:54:22 UTC
(In reply to Alan Somers from comment #36)

> changes in fuse_vnop_getpages

This seems to be Bingo.
I replaced 12-STABLE's fuse_vnop_getpages with 12.0R's and haven't got a deadlock so far.
It might just change the timing to trigger the sendfile's deadlock but is working well in my environment.
Comment 39 jSML4ThWwBID69YC 2020-07-28 17:10:21 UTC

Just adding that I also have issues with Nginx on Fuse, specifically using MooseFS. I opened a bug on MooseFS side yesterday, but adding here to show at least a second person is having Nginx + Fuse specific issues. I'm running 12.1p7. 

MooseFS report: https://github.com/moosefs/moosefs/issues/381
Comment 40 Hiroshi Nishida 2020-07-30 19:30:10 UTC
I tested 12.0R's sys/kern/kern_sendfile.c on 12-STABLE.
Unfortunately, I still get the same problem.
Comment 41 Alan Somers freebsd_committer 2020-09-20 20:34:15 UTC
(In reply to jSML4ThWwBID69YC from comment #39)
Can you reproduce the problem reliably using MooseFS?  If so, post full steps to reproduce and I'll try it.  Also, have you tried FreeBSD current yet?
Comment 42 Hiroshi Nishida 2020-09-21 14:27:32 UTC
By the way, the same problem seems to happen with NFS and a regular FS.
Please see
Comment 43 Christos Chatzaras 2020-09-21 14:40:14 UTC
Yes it was me that report it to the forum. With sendfile disabled in Nginx no panics. I use UFS2.
Comment 44 Alan Somers freebsd_committer 2020-09-21 15:46:07 UTC
Deassigning this bug from myself since it isn't FUSE-specific.  Gleb or Chuck, could you take a look?  This still might be the same problem that was fixed by r361852.  There have been no reports of reproducing it on head past that revision.
Comment 45 Gleb Smirnoff freebsd_committer 2020-09-21 16:02:14 UTC
I had faced this problem with 12.1-RELEASE and NFS. I contacted rmacklem@ but he wasn't able to provide help.

I am absolutely sure this isn't a general sendfile problem, but specific to NFS. It is possible that FUSE also has it.

FFS definitely doesn't have this problem given amount of use it has.

This looks like a missing wakeup on a page after it is filled.
Comment 46 Alan Somers freebsd_committer 2020-09-21 16:29:02 UTC
(In reply to Gleb Smirnoff from comment #45)
Based on your understanding of the problem, could you suggest a better way to reproduce it?  The reason there's been no progress so far is because nobody has come up with an easy reproduction case.
Comment 47 Gleb Smirnoff freebsd_committer 2020-09-21 16:32:13 UTC
No idea, sorry. I faced it on a production server that serves mostly from UFS, but there are a few NFS mounts also available via HTTP, with small traffic. The problem happens once in a few months on this server. Sorry again for not being helpful.
Comment 48 Gleb Smirnoff freebsd_committer 2020-09-21 16:34:31 UTC
One more thought. The problem could live outside of NFS or FUSE but in general VM page code and is exposed only by those filesystems that run sendfile synchronously (UFS doesn't). However, ZFS also runs sendfile synchronously, and I'm pretty sure its use is way wider than NFS and FUSE combined, however we don't have such reports for ZFS.
Comment 49 Alan Somers freebsd_committer 2020-09-21 16:43:12 UTC
(In reply to Gleb Smirnoff from comment #48)
But ZFS also doesn't use the page cache.  Maybe the problem is specific to file systems that use the page cache and also run sendfile synchronously?

Do you use any torture test program with sendfile?  It looks like neither fsx nor fio support it.
Comment 50 Gleb Smirnoff freebsd_committer 2020-09-21 16:45:42 UTC
There is number of torture tests for sendfile in pho@ suite. Other way to test is just wrk against nginx.
Comment 51 jSML4ThWwBID69YC 2020-09-21 17:06:37 UTC
(In reply to Alan Somers from comment #41)

Sendfile seems to work fine, but using AIO/Nginx + MooseFS/FuseFS causes Nginx to hang. 

Error message: kernel: pid 60903 (nginx) is attempting to use unsafe AIO requests - not logging anymore. 

Disabling AIO in Nginx works around the hang. 

The second Nginx issue is related to the fuse lookup cache. An error happens when Nginx tries to write to a log file on a directory with an exceeded disk quota. The disk quotas are being set by MooseFS.

Error message: kernel: fuse_vnop_lookup: WB cache incoherent on /path/to/mfsmount point!

I'm not sure either of these are related to this bug anymore. I'd added reference to this originally as it appeared Nginx+Fuse specific before. 
I have not tried current yet.
Comment 52 Alan Somers freebsd_committer 2020-09-21 17:10:09 UTC
(In reply to jSML4ThWwBID69YC from comment #51)
Those are definitely unrelated.  Open separate PRs for them if you want help.
Comment 53 jSML4ThWwBID69YC 2020-09-21 17:10:54 UTC
(In reply to Alan Somers from comment #52)

Will do. Thanks.
Comment 54 Gleb Smirnoff freebsd_committer 2020-09-21 17:16:14 UTC
Then that's a crazy mix of subsystems involved!

Here goes description of how nginx combines aio and sendfile. When sendfile(2) returns EBUSY, nginx would do aio_read(2) of 1 byte out of the file, and use a side effect of the page being cached, then it would retry sendfile again.

Note, that EBUSY from sendfile in FreeBSD 11 and before means totally different thing than EBUSY from sendfile on FreeBSD 12 and after. On older versions EBUSY means that SF_NODISKIO was set on request and file is not in memory. On new versions EBUSY is a soft error that means page is busy (again SF_NODISKIO is set). The busy condition would go away in a few milliseconds. However, nginx doesn't differentiate between versions of FreeBSD and always does this aio_read() of 1 byte. It is harmless on newer versions of FreeBSD and just adds a small delay before retry. At Netflix we have removed this code from nginx and in case of EBUSY we just retry after a timer, so we don't use aio.

It is entirely possible that problem shows up only when aio_read() and sendfile are combined on the same region of a file on NFS or FUSE :(
Comment 55 Alan Somers freebsd_committer 2020-09-21 17:25:26 UTC
(In reply to Gleb Smirnoff from comment #54)
It's not likely that aio_read makes a difference, since he is seeing the warning about "use unsafe AIO requests".  That warning means that aio_read would've returned EOPNOTSUPP without doing anything.  UNLESS there's a bug in aio(4) very high in the stack, before the safety check, that is leaking a resource.

Background: "unsafe" AIO means operations where there is no guarantee that the operation will ever complete, due to network unreliability (in your case), or disk unreliability, if you're accessing a disk directly rather than a file system.  I've never liked that seat belt, because it blocks so many of AIO's best use cases.  You can disable it by setting vfs.aio.enable_unsafe=1 in /etc/sysctl.conf.

I'm puzzled that disabling aio in NGinx makes a difference.  Could you please repeat that experiment?  It shouldn't matter, if unsafe AIO is disabled and you're serving from NFS or FUSE.
Comment 56 Gleb Smirnoff freebsd_committer 2020-09-21 17:55:02 UTC
If aio is disabled in nginx, it would not set SF_NODISKIO and thus sendfile would wait on busy pages.
Comment 57 jSML4ThWwBID69YC 2020-09-21 18:40:01 UTC
(In reply to Alan Somers from comment #55)

I've tested three options in the nginx.conf. 

sendfile on;
This works as expected. 

aio on; 
This causes content not to load. Nginx keeps running, but does not seem to read files from disk. Here's an error from /var/log/nginx/error.log. 

"2020/09/21 18:16:18 [crit] 83947#100590: *592 aio_read("/path/to/public_html/test.html") failed (45: Operation not supported) while sending response to client, client:<snip>"

aio threads;
This also seems to work as expected. I'm not sure why threads works, but normal aio does not. 

Setting vfs.aio.enable_unsafe=1 made no difference in the results. 

For this test data is stored on MooseFS. Duplicating the test on ZFS does not show the same issue. 

FreeBSD 12.1-RELEASE-p10 GENERIC  amd64
nginx version: nginx/1.18.0
built with OpenSSL 1.1.1g  21 Apr 2020
TLS SNI support enabled

Should I open this as a separate bug still?
Comment 58 Alan Somers freebsd_committer 2020-09-21 18:47:03 UTC
(In reply to jSML4ThWwBID69YC from comment #57)
Keep using this bug to discuss NGinx hangs.  Open separate bugs for anything else.  Can you easily reproduce the hang with MooseFS?  If so, please describe full steps to reproduce, assuming that the reader has no knowledge of how to configure MooseFS.
Comment 59 Hiroshi Nishida 2020-09-21 18:58:44 UTC
Let me also try with vfs.aio.enable_unsafe=1
I'm doing make world stuff but it's gonna take over an hour because of a slow hardware (Core i3-8109U).
Comment 60 Hiroshi Nishida 2020-09-21 21:47:26 UTC
Nginx still stalls with vfs.aio.enable_unsafe=1.
Comment 61 jSML4ThWwBID69YC 2020-10-06 02:50:43 UTC
(In reply to Alan Somers from comment #58)

I've upgraded to MooseFS 3.0.114 and confirmed the issue is still there. I'll try and follow up with instructions on building/installing MooseFS tomorrow.

Any chance https://reviews.freebsd.org/D26485 is related? I can spin up a local vm and test that first, assuming it's worth the time.
Comment 62 Alan Somers freebsd_committer 2020-10-06 03:41:32 UTC
(In reply to jSML4ThWwBID69YC from comment #61)
No, that review is relevant for writes only.
Comment 63 Piotr Robert Konopelko (MooseFS) 2020-10-10 00:56:36 UTC
(In reply to Alan Somers from comment #58)

@jSML4ThWwBID69YC – you can find an attachment with a list of steps on setting up a minimal MooseFS cluster on FreeBSD that Agata from our team wrote some time ago, by the way of describing steps to reproduce another bug, #245689. I believe it will be useful in this case too, since Alan wrote:

> (...) please describe full steps to reproduce, assuming that the reader has no knowledge of how to configure MooseFS.

And here goes the link:


Hope it can save your time a bit.
Comment 64 Marcin Gryszkalis 2021-05-06 12:07:55 UTC
I have same problem on 12.2-RELEASE-p4 with nginx + NFS, happens once a month on production, after lastest hang (today) I turned sendfile off.

 1025 18521 18512  0   20  0   31724   18592 grbmaw   D     -      0:11.40 nginx: worker process (nginx)

Comment 65 Marcin Gryszkalis 2021-05-06 12:55:14 UTC
It looks like #244713 has similar issue referenced. There are 2 patches by @kib, vfs_io.c is applied in 12.2 but kern_sendfile.c is not.
Comment 66 Gleb Smirnoff freebsd_committer 2021-05-06 18:08:28 UTC
(In reply to Marcin Gryszkalis from comment #65)
I'd suggest to upgrade to 13.0-RELEASE.

I also got this "grbmaw" problem when serving off NFS and going to try to switch to 13.0 soon. I will report if the problem goes away.

However, I'm not sure that this problem is exactly the same that FUSE and MooseFS has.
Comment 67 Christos Chatzaras 2021-12-28 16:28:56 UTC
Maybe this is related?

Comment 68 Gleb Smirnoff freebsd_committer 2021-12-28 16:48:09 UTC
(In reply to Christos Chatzaras from comment #67)

Can't be related to the problem in this report, as we see processes stuck in "grbmaw" state which clearly is a kernel bug. But it could be related to problem described in Comment 51 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246886#c51
Comment 69 firk 2022-06-17 06:46:42 UTC
Got this on 12.3-RELEASE, fusefs, nginx, sendfile. Easily reproducible (just start downloading the same big file and cancel after 1 sec, repeat 10-20 times), but my quickly writen test program which does sendfile() the same big file info socketpair() socket - never hangs. Will try to investigate more.
Comment 70 firk 2022-06-17 16:52:41 UTC
So, the source of the problem seems was base r337165 . I didn't tested revisions around, but I did tested rollback of this specific comit, setting f_iosize back to PAGE_SIZE=4096 and the problem is gone. But this is not the bug itself, it is just a trigger for another problem(s).

As already noted, backtrace for the deadlock is:

 allocbuf+0x371 (vfs_vmio_extend inlined inside)
 vn_sendfile+0xdf2 (sendfile_swapin inlined inside)

What happens:

1) sendfile_swapin() grabs and exclusively-busies bunch of pages via vm_page_grab_pages();

2) it then scans them sequentially, unbusies already loaded ones, and calls vm_pages_get_pages_async() for not yet loaded ones, which should load them and call sendfile_iodone() callback, and that how it was in 11.x;

3) vm_pages_get_pages_async() calls some other nested functions, and now we are in vfs_bio_getpages(). Note: despite the "async" name all this done synchronously;

4) vfs_bio_getpages() still have vm_page[] array and its size as arguments, passed unchanged straightly from sendfile_swapin(); it downgrades exclusive-busy state to shared-busy for the given pages range;

5) the next step (bread_gb -> breadn_flags) is done using block index and size obtained from fusefs driver via get_lblkno() and get_blksize() callbacks, and the new block size is 65536 by default. And, going through getblkx() -> allocbuf() -> vfs_vmio_extend(), the last one calls vm_page_grab_pages() again, but the range is not the requested one, but the one matches fusefs block size, effectively aligned to 16-block boundary (65536 = 16*4096). This leads to deadlock because the pages after currently requested are still exclusively-busy (see p.2)

What could be fixed:

1) easiest: rollback iosize to PAGE_SIZE, but this will reduce i/o speed back

2) rework sendfile_swapin() to first scan entire range for being loaded or not and only then calling queued vm_pager_get_pages_async(); don't think it is good because everythink already works when fusefs/nfs not used.

3) make "async" functions really async (see p.3) for fusefs; i don't know if it easy or not - this will resolve deadlock too because vfs_bio_getpages() will not block the sequential scan of requested pages by sendfile_swapin()

4) prevent partially loaded filesystem f_iosize blocks from happening; again, I don't know is it easy or even desirable or not.

PS: I don't know how all this works or not in 13.x