Bug 256937

Summary: [fusefs] Kernel panic in FUSE - lockmgr_xlock_hard: recursing on non recursive lockmgr
Product: Base System Reporter: Agata <chogata>
Component: kernAssignee: Alan Somers <asomers>
Status: Closed FIXED    
Severity: Affects Some People CC: asomers, grahamperrin, jSML4ThWwBID69YC, piotr.konopelko
Priority: --- Keywords: crash
Version: 13.0-RELEASEFlags: asomers: mfc-stable13+
asomers: mfc-stable12+
asomers: mfc-stable11-
Hardware: Any   
OS: Any   
URL: https://reviews.freebsd.org/D32332
Attachments:
Description Flags
Kernel panic backtrace message
none
30 day uptime
none
Program to help panic the kernel
none
Script to help panic the kernel
none
Fix two panics and quiet some warnings
none
Fix two panics and quiet some warnings
none
Patch formatted for releng/13.0 none

Description Agata 2021-07-02 12:01:47 UTC
Created attachment 226169 [details]
Kernel panic backtrace message

The problem is repeatedly spotted on a machine with FreeBSD 13.0 running MooseFS client that uses FUSE. The machine crashes daily, I'm attaching a screenshot of panic backtrace. It was reported to MooseFS team by a user and we promised to help investigate the problem.

From the fact, that this FreeBSD machine logs repeatedly:
"fuse_internal_do_getattr: cache incoherent on [PATH]! Buggy FUSE server detected" and the backtrace that shows an attempt at write and then invokes "fuse_internal_do_getattr", we suspect that the kernel panics when it gets a different length than it has in its internal structures and tries to invalidate the data cache buffer, which triggers a second attempt to lock an already locked buffer lock and that this might be somehow related to the kernel behaviour described in bug #256936

I will be pointing out this thread to the user that experiences the crashes, he will be able to add any details about the exact configuration of the affected system.
Comment 1 Alan Somers freebsd_committer freebsd_triage 2021-07-02 19:40:45 UTC
Shoot, that's no good.  I don't suppose you've found a reliable way to reproduce the crash?
Comment 2 jSML4ThWwBID69YC 2021-07-02 21:31:47 UTC
Created attachment 226183 [details]
30 day uptime
Comment 3 jSML4ThWwBID69YC 2021-07-02 21:34:01 UTC
Hi Alan, 

I'm not able to trigger it by hand and I've been trying since April when the problem started. That said, it will reliably happen once a day, though it normally happens more often. See the attached graph showing system uptime over thirty days. Anytime it goes down, it's due to the kernel panic.  

Heaver file access usage seems to increase the frequency of the panic. The most frequent panics happen on servers running Nginx with inbound traffic. Less frequent are servers that mostly run backups. It seems that randomizing access patterns, especially across multiple mount points  will trigger it faster. 

I started seeing the panics during April on FreeBSD 12.2. I don't remember the exact version, but it was around the update for https://www.freebsd.org/security/advisories/FreeBSD-SA-21:09.accept_filter.asc. All of the systems have upgraded to 13 at this point, so I can't confirm 12.2 anymore. 

For reference, the servers are mostly running Nginx, uWSGi, running PHP and Python based applications that are stored on MooseFS. Nothing fancy really, and it's worked great since 12.1. I do have monitoring on the system using /usr/sbin/prometheus_sysctl_exporter and Prometheus, so if there is any data of use, please let me know.
Comment 4 Graham Perrin freebsd_committer freebsd_triage 2021-07-03 08:19:18 UTC
Pictured: 

fuse_internal_do_getattr: cache incoherent on /storage/chunk!  Buggy FUSE server detected.  To prevent data corruption, disable the data cache by mounting with -o direct_io, or as directed otherwise by your FUSE server's documentation

– is also mentioned in: 

Buggy fuse server detected · Issue #390 · moosefs/moosefs
<https://github.com/moosefs/moosefs/issues/390>

– also (without reference to MooseFS) in bug 246577, 

> sysutils/fuse-ntfs triggers a cache inconsistency kernel error message 

----

That's to bring some things together, although (sorry) I don't imagine that can help with the kernel panic.
Comment 5 jSML4ThWwBID69YC 2021-07-06 16:46:27 UTC
While testing MooseFS mounted with -o mfscachemode=DIRECT for bug #256936, I noticed the kernel panics no longer happen. 

The servers would reboot at least once, but normally multiple times a day with the cache enabled. Disabling it has made it work without crashing for the last several days. I wish this was solution, but with the cache disabled, performance is significantly degraded. 

This seems to confirm it's something in the caching that's causing the panic.
Comment 6 Agata 2021-10-01 12:51:47 UTC
Created attachment 228319 [details]
Program to help panic the kernel
Comment 7 Agata 2021-10-01 12:52:26 UTC
Created attachment 228320 [details]
Script to help panic the kernel
Comment 8 Agata 2021-10-01 12:59:40 UTC
Good news :)

During our work on our filesystem-stress-test tool we managed to trigger this particular kernel panic again. From there, with some more tests, we managed to separate a simple set of operations that will trigger the panic.

How to replicate:

1) Install FreeBSD 13. Install either MooseFS or SSHfs. Mount the selected fusefs filesystem somewhere (let's say /mnt/fusefs).

2) Download the attached files: program (.c file) and script (.sh) to /mnt/fusefs

3) Name the .c file "simple.c" and compile it (simple make will do)

4) Run the script inside /mnt/fusefs

5) Wait a few minutes.
Comment 9 Alan Somers freebsd_committer freebsd_triage 2021-10-02 00:12:39 UTC
Thanks for the test case!  I was able to reproduce the panic on 14.0-CURRENT in < 1 minute.
Comment 10 Alan Somers freebsd_committer freebsd_triage 2021-10-03 18:07:12 UTC
Created attachment 228412 [details]
Fix two panics and quiet some warnings

Agata, could you please test this patch?  It would be good to test it with actual MooseFS, because sshfs+blow_up_kernel.sh is not capable of reaching all paths that might lead to the crash.

The patch does three things:
* Fixes the reported crash during VOP_BMAP
* Fixes the reported crash during VOP_STRATEGY
* Consolidates some error checking code, and eliminates the "cache incoherent" warnings in some cases where they were inappropriate.
Comment 11 Agata 2021-10-04 15:05:57 UTC
Hi Alan,

I'd love to test it, but which version should I patch? The latest sources from github (git clone --branch main https://git.FreeBSD.org/src.git /usr/src) throw some errors when I dry run the patch:

Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|diff --git a/sys/fs/fuse/fuse_internal.c b/sys/fs/fuse/fuse_internal.c
|index c52f12d3a71a..ffa4a40095b2 100644
|--- a/sys/fs/fuse/fuse_internal.c
|+++ b/sys/fs/fuse/fuse_internal.c
--------------------------
Patching file sys/fs/fuse/fuse_internal.c using Plan A...
Hunk #1 succeeded at 255.
Hunk #2 succeeded at 272.
Hunk #3 succeeded at 852.
Hunk #4 failed at 958.
Hunk #5 succeeded at 1258 (offset -1 lines).
1 out of 5 hunks failed while patching sys/fs/fuse/fuse_internal.c
Hmm...  The next patch looks like a unified diff to me...
The text leading up to this was:
--------------------------
|diff --git a/sys/fs/fuse/fuse_internal.h b/sys/fs/fuse/fuse_internal.h
|index 20a10d7dfda0..e9fa3857227a 100644
|--- a/sys/fs/fuse/fuse_internal.h
|+++ b/sys/fs/fuse/fuse_internal.h
--------------------------
Patching file sys/fs/fuse/fuse_internal.h using Plan A...
Hunk #1 succeeded at 223.
Hmm...  The next patch looks like a unified diff to me...
The text leading up to this was:
--------------------------
|diff --git a/sys/fs/fuse/fuse_io.c b/sys/fs/fuse/fuse_io.c
|index 47c7fb97e556..f818fbd52869 100644
|--- a/sys/fs/fuse/fuse_io.c
|+++ b/sys/fs/fuse/fuse_io.c
--------------------------
Patching file sys/fs/fuse/fuse_io.c using Plan A...
Hunk #1 succeeded at 571.
Hunk #2 succeeded at 715.
Hunk #3 succeeded at 1007.
Hmm...  The next patch looks like a unified diff to me...
The text leading up to this was:
--------------------------
|diff --git a/sys/fs/fuse/fuse_node.c b/sys/fs/fuse/fuse_node.c
|index 6acd1df29361..c296a3b3c330 100644
|--- a/sys/fs/fuse/fuse_node.c
|+++ b/sys/fs/fuse/fuse_node.c
--------------------------
Patching file sys/fs/fuse/fuse_node.c using Plan A...
Hunk #1 succeeded at 389.
Hunk #2 succeeded at 437.
Hmm...  The next patch looks like a unified diff to me...
The text leading up to this was:
--------------------------
|diff --git a/sys/fs/fuse/fuse_node.h b/sys/fs/fuse/fuse_node.h
|index e5fb13d43bec..dcf5e4047778 100644
|--- a/sys/fs/fuse/fuse_node.h
|+++ b/sys/fs/fuse/fuse_node.h
--------------------------
Patching file sys/fs/fuse/fuse_node.h using Plan A...
Hunk #1 succeeded at 201.
Hmm...  The next patch looks like a unified diff to me...
The text leading up to this was:
--------------------------
|diff --git a/sys/fs/fuse/fuse_vfsops.c b/sys/fs/fuse/fuse_vfsops.c
|index b188996066d5..3d5e168bcde2 100644
|--- a/sys/fs/fuse/fuse_vfsops.c
|+++ b/sys/fs/fuse/fuse_vfsops.c
--------------------------
Patching file sys/fs/fuse/fuse_vfsops.c using Plan A...
Hunk #1 succeeded at 539.
Hunk #2 failed at 575.
1 out of 2 hunks failed while patching sys/fs/fuse/fuse_vfsops.c
Hmm...  The next patch looks like a unified diff to me...
The text leading up to this was:
--------------------------
|diff --git a/sys/fs/fuse/fuse_vnops.c b/sys/fs/fuse/fuse_vnops.c
|index 4c8c8c1d4461..9aafbad990c5 100644
|--- a/sys/fs/fuse/fuse_vnops.c
|+++ b/sys/fs/fuse/fuse_vnops.c
--------------------------
Patching file sys/fs/fuse/fuse_vnops.c using Plan A...
Hunk #1 succeeded at 516.
Hunk #2 succeeded at 551.
Hunk #3 succeeded at 907.
Hunk #4 succeeded at 1163.
Hunk #5 failed at 1372.
1 out of 5 hunks failed while patching sys/fs/fuse/fuse_vnops.c
Hmm...  The next patch looks like a unified diff to me...
The text leading up to this was:
--------------------------
|diff --git a/tests/sys/fs/fusefs/bmap.cc b/tests/sys/fs/fusefs/bmap.cc
|index 3f392a4447f2..c635f4d7e46f 100644
|--- a/tests/sys/fs/fusefs/bmap.cc
|+++ b/tests/sys/fs/fusefs/bmap.cc
--------------------------
Patching file tests/sys/fs/fusefs/bmap.cc using Plan A...
Hunk #1 succeeded at 75.
Hunk #2 succeeded at 167.
Hmm...  The next patch looks like a unified diff to me...
The text leading up to this was:
--------------------------
|diff --git a/tests/sys/fs/fusefs/write.cc b/tests/sys/fs/fusefs/write.cc
|index fadf4648d31c..db5bb3fe4441 100644
|--- a/tests/sys/fs/fusefs/write.cc
|+++ b/tests/sys/fs/fusefs/write.cc
--------------------------
Patching file tests/sys/fs/fusefs/write.cc using Plan A...
Hunk #1 succeeded at 208.
Hunk #2 succeeded at 529.
done
Comment 12 jSML4ThWwBID69YC 2021-10-04 15:08:28 UTC
Same issue here trying to patch FreeBSD 13.0-p4.
Comment 13 Alan Somers freebsd_committer freebsd_triage 2021-10-04 15:43:59 UTC
Created attachment 228438 [details]
Fix two panics and quiet some warnings

Rebased the patch onto main
Comment 14 jSML4ThWwBID69YC 2021-10-04 16:08:05 UTC
(In reply to Alan Somers from comment #13)

It's not patching cleanly on 13.0 releng. Is it supposed to, or do I need current?
Comment 15 Alan Somers freebsd_committer freebsd_triage 2021-10-04 16:14:08 UTC
You need current.  I haven't tried making a version for 13.0-RELEASE yet, but I can.
Comment 16 jSML4ThWwBID69YC 2021-10-04 16:18:35 UTC
(In reply to Alan Somers from comment #15)

A patch for 13 release would be great. I have multiple systems to test it across on the release branch.
Comment 17 Alan Somers freebsd_committer freebsd_triage 2021-10-04 20:13:06 UTC
Created attachment 228446 [details]
Patch formatted for releng/13.0

Here's a patch that can apply to releng/13.0.  It compiles, but I haven't runtime tested it.
Comment 18 jSML4ThWwBID69YC 2021-10-05 01:34:21 UTC
(In reply to Alan Somers from comment #17)

I've compiled and runtime tested it on 13.0 release. So far as I can tell, it solves the issue. I've run the MooseFS provided test for over two hours without any crash. This was run on an already loaded system. 

Side note, it also solves an issue with large 'git clone' commands failing.
Comment 19 Agata 2021-10-05 12:24:22 UTC
The kernel panic doesn't happen anymore on FreeBSD 14 with patched kernel when using MooseFS and the test script.
Comment 20 commit-hook freebsd_committer freebsd_triage 2021-10-06 20:09:01 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=032a5bd55b3a003d3560435422a95f27f91685fe

commit 032a5bd55b3a003d3560435422a95f27f91685fe
Author:     Alan Somers <asomers@FreeBSD.org>
AuthorDate: 2021-10-03 17:51:14 +0000
Commit:     Alan Somers <asomers@FreeBSD.org>
CommitDate: 2021-10-06 20:07:33 +0000

    fusefs: Fix a bug during VOP_STRATEGY when the server changes file size

    If the FUSE server tells the kernel that a file's size has changed, then
    the kernel must invalidate any portion of that file in cache.  But the
    kernel can't do that during VOP_STRATEGY, because the file's buffers are
    already locked.  Instead, proceed with the write.

    PR:             256937
    Reported by:    Agata <chogata@moosefs.pro>
    Tested by:      Agata <chogata@moosefs.pro>
    MFC after:      2 weeks
    Reviewed by:    pfg
    Differential Revision: https://reviews.freebsd.org/D32332

 sys/fs/fuse/fuse_io.c        | 19 +++++++----
 tests/sys/fs/fusefs/write.cc | 81 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 93 insertions(+), 7 deletions(-)
Comment 21 commit-hook freebsd_committer freebsd_triage 2021-10-06 20:09:03 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=7430017b9978cae054ed99e5160f739e5ca021d5

commit 7430017b9978cae054ed99e5160f739e5ca021d5
Author:     Alan Somers <asomers@FreeBSD.org>
AuthorDate: 2021-10-02 18:17:36 +0000
Commit:     Alan Somers <asomers@FreeBSD.org>
CommitDate: 2021-10-06 20:07:33 +0000

    fusefs: fix a recurse-on-non-recursive lockmgr panic

    fuse_vnop_bmap needs to know the file's size in order to calculate the
    optimum amount of readahead.  If the file's size is unknown, it must ask
    the FUSE server.  But if the file's data was previously cached and the
    server reports that its size has shrunk, fusefs must invalidate the
    cached data.  That's not possible during VOP_BMAP because the buffer
    object is already locked.

    Fix the panic by not querying the FUSE server for the file's size during
    VOP_BMAP if we don't need it.  That's also a a slight performance
    optimization.

    PR:             256937
    Reported by:    Agata <chogata@moosefs.pro>
    Tested by:      Agata <chogata@moosefs.pro>
    MFC after:      2 weeks

 sys/fs/fuse/fuse_vnops.c    | 20 ++++++++---
 tests/sys/fs/fusefs/bmap.cc | 86 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 102 insertions(+), 4 deletions(-)
Comment 22 commit-hook freebsd_committer freebsd_triage 2021-10-12 05:38:28 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=2ccb87689fdf15b62e1a7d078ccd21db53a07e9f

commit 2ccb87689fdf15b62e1a7d078ccd21db53a07e9f
Author:     Alan Somers <asomers@FreeBSD.org>
AuthorDate: 2021-10-02 18:17:36 +0000
Commit:     Alan Somers <asomers@FreeBSD.org>
CommitDate: 2021-10-12 05:13:58 +0000

    fusefs: fix a recurse-on-non-recursive lockmgr panic

    fuse_vnop_bmap needs to know the file's size in order to calculate the
    optimum amount of readahead.  If the file's size is unknown, it must ask
    the FUSE server.  But if the file's data was previously cached and the
    server reports that its size has shrunk, fusefs must invalidate the
    cached data.  That's not possible during VOP_BMAP because the buffer
    object is already locked.

    Fix the panic by not querying the FUSE server for the file's size during
    VOP_BMAP if we don't need it.  That's also a a slight performance
    optimization.

    PR:             256937
    Reported by:    Agata <chogata@moosefs.pro>
    Tested by:      Agata <chogata@moosefs.pro>

    (cherry picked from commit 7430017b9978cae054ed99e5160f739e5ca021d5)

 sys/fs/fuse/fuse_vnops.c    | 20 ++++++++---
 tests/sys/fs/fusefs/bmap.cc | 86 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 102 insertions(+), 4 deletions(-)
Comment 23 commit-hook freebsd_committer freebsd_triage 2021-10-12 05:57:32 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=955a418bd3036dc055d97cf98faec0be137b3b11

commit 955a418bd3036dc055d97cf98faec0be137b3b11
Author:     Alan Somers <asomers@FreeBSD.org>
AuthorDate: 2021-10-03 17:51:14 +0000
Commit:     Alan Somers <asomers@FreeBSD.org>
CommitDate: 2021-10-12 05:41:25 +0000

    fusefs: Fix a bug during VOP_STRATEGY when the server changes file size

    If the FUSE server tells the kernel that a file's size has changed, then
    the kernel must invalidate any portion of that file in cache.  But the
    kernel can't do that during VOP_STRATEGY, because the file's buffers are
    already locked.  Instead, proceed with the write.

    PR:             256937
    Reported by:    Agata <chogata@moosefs.pro>
    Tested by:      Agata <chogata@moosefs.pro>
    Reviewed by:    pfg
    Differential Revision: https://reviews.freebsd.org/D32332

    (cherry picked from commit 032a5bd55b3a003d3560435422a95f27f91685fe)

 sys/fs/fuse/fuse_io.c        | 19 ++++++----
 tests/sys/fs/fusefs/write.cc | 82 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 94 insertions(+), 7 deletions(-)
Comment 24 commit-hook freebsd_committer freebsd_triage 2021-12-07 05:13:02 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=bdd7a3f2a17c4ed6361f1789a72fb07403e42172

commit bdd7a3f2a17c4ed6361f1789a72fb07403e42172
Author:     Alan Somers <asomers@FreeBSD.org>
AuthorDate: 2021-10-02 18:17:36 +0000
Commit:     Alan Somers <asomers@FreeBSD.org>
CommitDate: 2021-12-07 05:08:32 +0000

    fusefs: fix a recurse-on-non-recursive lockmgr panic

    fuse_vnop_bmap needs to know the file's size in order to calculate the
    optimum amount of readahead.  If the file's size is unknown, it must ask
    the FUSE server.  But if the file's data was previously cached and the
    server reports that its size has shrunk, fusefs must invalidate the
    cached data.  That's not possible during VOP_BMAP because the buffer
    object is already locked.

    Fix the panic by not querying the FUSE server for the file's size during
    VOP_BMAP if we don't need it.  That's also a a slight performance
    optimization.

    PR:             256937
    Reported by:    Agata <chogata@moosefs.pro>
    Tested by:      Agata <chogata@moosefs.pro>

    (cherry picked from commit 7430017b9978cae054ed99e5160f739e5ca021d5)

 sys/fs/fuse/fuse_vnops.c    | 20 ++++++++---
 tests/sys/fs/fusefs/bmap.cc | 86 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 102 insertions(+), 4 deletions(-)
Comment 25 commit-hook freebsd_committer freebsd_triage 2021-12-07 05:18:04 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=7adb46c70f2e050386422ffdda38ada3b61cd998

commit 7adb46c70f2e050386422ffdda38ada3b61cd998
Author:     Alan Somers <asomers@FreeBSD.org>
AuthorDate: 2021-10-03 17:51:14 +0000
Commit:     Alan Somers <asomers@FreeBSD.org>
CommitDate: 2021-12-07 05:14:10 +0000

    fusefs: Fix a bug during VOP_STRATEGY when the server changes file size

    If the FUSE server tells the kernel that a file's size has changed, then
    the kernel must invalidate any portion of that file in cache.  But the
    kernel can't do that during VOP_STRATEGY, because the file's buffers are
    already locked.  Instead, proceed with the write.

    PR:             256937
    Reported by:    Agata <chogata@moosefs.pro>
    Tested by:      Agata <chogata@moosefs.pro>
    Reviewed by:    pfg
    Differential Revision: https://reviews.freebsd.org/D32332

    (cherry picked from commit 032a5bd55b3a003d3560435422a95f27f91685fe)

 sys/fs/fuse/fuse_io.c        | 19 +++++++----
 tests/sys/fs/fusefs/write.cc | 81 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 93 insertions(+), 7 deletions(-)