Bug 247276

Summary: [fusefs]: Lockup when using mmap w/ direct_io
Product: Base System Reporter: trapexit
Component: kernAssignee: Alan Somers <asomers>
Status: Closed FIXED    
Severity: Affects Some People CC: asomers, cem, nishida
Priority: --- Keywords: needs-qa
Version: 12.1-RELEASEFlags: koobs: maintainer-feedback? (asomers)
Hardware: Any   
OS: Any   
URL: https://reviews.freebsd.org/D26485
Attachments:
Description Flags
example mmap usage none

Description trapexit 2020-06-15 12:33:53 UTC
Created attachment 215581 [details]
example mmap usage

I'm the author of the FUSE based filesystem mergerfs. A user recently reported that after updating to 12.1 some client software to mergerfs was locking up. rtorrent in particular. 

rtorrent uses mmap. Linux's FUSE implementation is unable to handle mmap when direct_io is enabled so I recommend to users not enable direct_io who need any software leveraging mmap. The user hadn't read the docs and was using direct_io on FreeBSD. It appears that mmap does work on FreeBSD's FUSE implementation when direct_io is used so he had no issues.

Once updated his setup started blocking and the apps became unkillable (waiting on IO). I wrote a simple mmap example to read and write to a shared mapped file and could reproduce the issue.

When direct_io is off everything works fine. If I enable direct_io I will see read requests come into the mergerfs server and so long as I only read from the mapped memory it works fine. When writing however as soon as it needs to flush it seems to lock up. I don't see any write commands come in, the client app blocks and is unkillable, other calls into the filesystem seem to work briefly and then block. A stack trace of mergerfs seems to indicate that it is working as normal and could take requests. If mergerfs is killed none of the clients receive an error from the syscalls they are blocked on.

I've been able to reproduce this with sshfs by simply adding `direct_io` to the mount options.

Attached is an example that triggers it.
Comment 1 Kubilay Kocak freebsd_committer freebsd_triage 2020-06-15 12:51:26 UTC
Thank you for the report and reproducer. If you are able to obtain a kernel syscall trace exhibiting the lockup, that might also prove handy
Comment 2 Alan Somers freebsd_committer freebsd_triage 2020-06-15 13:17:14 UTC
Thanks for the bug report!  Just to be clear, how should we use the attached program?  If it's meant to be used with sshfs, could you please provide the exact sshfs command line you used?
Comment 3 trapexit 2020-06-15 13:33:19 UTC
(In reply to Alan Somers from comment #2)

Yes, it will work with sshfs.

$ sshfs -o direct_io <src> <dst>
$ gcc -o /tmp/mmap-write mmap-write.c
$ cd <dst>
$ /tmp/mmap-write

It'll print the address offset and a value written into the location. It just blocks at the end.
Comment 4 Conrad Meyer freebsd_committer freebsd_triage 2020-06-15 16:29:01 UTC
FYI, `procstat -kk <pid>` can be used to show the kernel stack of a stuck process.
Comment 5 trapexit 2020-06-15 18:06:25 UTC
(In reply to Conrad Meyer from comment #4)

Thanks.

PID    TID COMM                TDNAME              KSTACK                       
783 100068 mmap-write          -                   mi_switch+0xe2 sleepq_wait+0x2c _sleep+0x247 vm_page_busy_sleep+0x8f vm_object_page_remove+0x203 vn_pages_remove+0x52 fuse_io_dispatch+0xebd VOP_WRITE_APV+0xec vnode_pager_generic_putpages+0x6ba VOP_PUTPAGES_APV+0x7c vnode_pager_putpages+0x84 vm_pageout_flush+0xed vm_object_page_collect_flush+0x1f2 vm_object_page_clean+0x146 vinactive+0xae vputx+0x2c3 vn_close1+0x181 vn_closefile+0x4c
Comment 6 Hiroshi Nishida 2020-06-30 16:21:33 UTC
Isn't this somehow related to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246886 ?

I thought there was something wrong with 12's sendfile but if fusefs also has a problem, the above problem gets more complicated.
In my case, the problem occurs even if direct_io is disabled, and sendfile deadlocks with vm_page_busy_sleep, not a fusefs program.

However, everything seems OK with CURRENT.
Comment 7 Hiroshi Nishida 2020-07-21 18:06:49 UTC
Although I posted to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246886, I'd like to also add a comment here.

I back-ported 12.0-R's fusefs to 12-STABLE and tested my program. As a result, no error occurred and I cannot help guessing there is something wrong with 12.1's fusefs.

I can test new fusefs if modified.
Comment 8 Alan Somers freebsd_committer freebsd_triage 2020-07-21 19:02:17 UTC
No Hiroshi I don't think this is related to 246886.  The stacks are completely different, and this bug is definitely particular to the use of direct_io.
Comment 9 Hiroshi Nishida 2020-07-21 21:30:03 UTC
(In reply to Alan Somers from comment #8)

OK.
However, I'll check the difference of 12.0 and 12.1 fusefs codes.
Comment 10 Alan Somers freebsd_committer freebsd_triage 2020-09-20 02:09:53 UTC
Reproduced on head with a minimal test case.
Comment 11 Alan Somers freebsd_committer freebsd_triage 2020-09-20 03:16:37 UTC
Code review in progress
Comment 12 commit-hook freebsd_committer freebsd_triage 2020-09-24 16:28:36 UTC
A commit references this bug:

Author: asomers
Date: Thu Sep 24 16:27:53 UTC 2020
New revision: 366121
URL: https://svnweb.freebsd.org/changeset/base/366121

Log:
  fusefs: fix mmap'd writes in direct_io mode

  If a FUSE server returns FOPEN_DIRECT_IO in response to FUSE_OPEN, that
  instructs the kernel to bypass the page cache for that file. This feature
  is also known by libfuse's name: "direct_io".

  However, when accessing a file via mmap, there is no possible way to bypass
  the cache completely. This change fixes a deadlock that would happen when
  an mmap'd write tried to invalidate a portion of the cache, wrongly assuming
  that a write couldn't possibly come from cache if direct_io were set.

  Arguably, we could instead disable mmap for files with FOPEN_DIRECT_IO set.
  But allowing it is less likely to cause user complaints, and is more in
  keeping with the spirit of open(2), where O_DIRECT instructs the kernel to
  "reduce", not "eliminate" cache effects.

  PR:		247276
  Reported by:	trapexit@spawn.link
  Reviewed by:	cem
  MFC after:	3 days
  Differential Revision:	https://reviews.freebsd.org/D26485

Changes:
  head/sys/fs/fuse/fuse_io.c
  head/tests/sys/fs/fusefs/write.cc
Comment 13 commit-hook freebsd_committer freebsd_triage 2020-09-27 03:00:30 UTC
A commit references this bug:

Author: asomers
Date: Sun Sep 27 02:59:29 UTC 2020
New revision: 366190
URL: https://svnweb.freebsd.org/changeset/base/366190

Log:
  MFC r366121:

  fusefs: fix mmap'd writes in direct_io mode

  If a FUSE server returns FOPEN_DIRECT_IO in response to FUSE_OPEN, that
  instructs the kernel to bypass the page cache for that file. This feature
  is also known by libfuse's name: "direct_io".

  However, when accessing a file via mmap, there is no possible way to bypass
  the cache completely. This change fixes a deadlock that would happen when
  an mmap'd write tried to invalidate a portion of the cache, wrongly assuming
  that a write couldn't possibly come from cache if direct_io were set.

  Arguably, we could instead disable mmap for files with FOPEN_DIRECT_IO set.
  But allowing it is less likely to cause user complaints, and is more in
  keeping with the spirit of open(2), where O_DIRECT instructs the kernel to
  "reduce", not "eliminate" cache effects.

  PR:		247276
  Reported by:	trapexit@spawn.link
  Reviewed by:	cem
  Differential Revision:	https://reviews.freebsd.org/D26485

Changes:
_U  stable/12/
  stable/12/sys/fs/fuse/fuse_io.c
  stable/12/tests/sys/fs/fusefs/write.cc
Comment 14 commit-hook freebsd_committer freebsd_triage 2020-09-28 00:24:48 UTC
A commit references this bug:

Author: asomers
Date: Mon Sep 28 00:24:00 UTC 2020
New revision: 366211
URL: https://svnweb.freebsd.org/changeset/base/366211

Log:
  MF stable/12 r366190:

  fusefs: fix mmap'd writes in direct_io mode

  If a FUSE server returns FOPEN_DIRECT_IO in response to FUSE_OPEN, that
  instructs the kernel to bypass the page cache for that file. This feature
  is also known by libfuse's name: "direct_io".

  However, when accessing a file via mmap, there is no possible way to bypass
  the cache completely. This change fixes a deadlock that would happen when
  an mmap'd write tried to invalidate a portion of the cache, wrongly assuming
  that a write couldn't possibly come from cache if direct_io were set.

  Arguably, we could instead disable mmap for files with FOPEN_DIRECT_IO set.
  But allowing it is less likely to cause user complaints, and is more in
  keeping with the spirit of open(2), where O_DIRECT instructs the kernel to
  "reduce", not "eliminate" cache effects.

  PR:		247276
  Approved by:	re (gjb)
  Reported by:	trapexit@spawn.link
  Reviewed by:	cem
  Differential Revision:	https://reviews.freebsd.org/D26485

Changes:
_U  releng/12.2/
  releng/12.2/sys/fs/fuse/fuse_io.c
  releng/12.2/tests/sys/fs/fusefs/write.cc