Created attachment 226169 [details] Kernel panic backtrace message The problem is repeatedly spotted on a machine with FreeBSD 13.0 running MooseFS client that uses FUSE. The machine crashes daily, I'm attaching a screenshot of panic backtrace. It was reported to MooseFS team by a user and we promised to help investigate the problem. From the fact, that this FreeBSD machine logs repeatedly: "fuse_internal_do_getattr: cache incoherent on [PATH]! Buggy FUSE server detected" and the backtrace that shows an attempt at write and then invokes "fuse_internal_do_getattr", we suspect that the kernel panics when it gets a different length than it has in its internal structures and tries to invalidate the data cache buffer, which triggers a second attempt to lock an already locked buffer lock and that this might be somehow related to the kernel behaviour described in bug #256936 I will be pointing out this thread to the user that experiences the crashes, he will be able to add any details about the exact configuration of the affected system.
Shoot, that's no good. I don't suppose you've found a reliable way to reproduce the crash?
Created attachment 226183 [details] 30 day uptime
Hi Alan, I'm not able to trigger it by hand and I've been trying since April when the problem started. That said, it will reliably happen once a day, though it normally happens more often. See the attached graph showing system uptime over thirty days. Anytime it goes down, it's due to the kernel panic. Heaver file access usage seems to increase the frequency of the panic. The most frequent panics happen on servers running Nginx with inbound traffic. Less frequent are servers that mostly run backups. It seems that randomizing access patterns, especially across multiple mount points will trigger it faster. I started seeing the panics during April on FreeBSD 12.2. I don't remember the exact version, but it was around the update for https://www.freebsd.org/security/advisories/FreeBSD-SA-21:09.accept_filter.asc. All of the systems have upgraded to 13 at this point, so I can't confirm 12.2 anymore. For reference, the servers are mostly running Nginx, uWSGi, running PHP and Python based applications that are stored on MooseFS. Nothing fancy really, and it's worked great since 12.1. I do have monitoring on the system using /usr/sbin/prometheus_sysctl_exporter and Prometheus, so if there is any data of use, please let me know.
Pictured: fuse_internal_do_getattr: cache incoherent on /storage/chunk! Buggy FUSE server detected. To prevent data corruption, disable the data cache by mounting with -o direct_io, or as directed otherwise by your FUSE server's documentation – is also mentioned in: Buggy fuse server detected · Issue #390 · moosefs/moosefs <https://github.com/moosefs/moosefs/issues/390> – also (without reference to MooseFS) in bug 246577, > sysutils/fuse-ntfs triggers a cache inconsistency kernel error message ---- That's to bring some things together, although (sorry) I don't imagine that can help with the kernel panic.
While testing MooseFS mounted with -o mfscachemode=DIRECT for bug #256936, I noticed the kernel panics no longer happen. The servers would reboot at least once, but normally multiple times a day with the cache enabled. Disabling it has made it work without crashing for the last several days. I wish this was solution, but with the cache disabled, performance is significantly degraded. This seems to confirm it's something in the caching that's causing the panic.
Created attachment 228319 [details] Program to help panic the kernel
Created attachment 228320 [details] Script to help panic the kernel
Good news :) During our work on our filesystem-stress-test tool we managed to trigger this particular kernel panic again. From there, with some more tests, we managed to separate a simple set of operations that will trigger the panic. How to replicate: 1) Install FreeBSD 13. Install either MooseFS or SSHfs. Mount the selected fusefs filesystem somewhere (let's say /mnt/fusefs). 2) Download the attached files: program (.c file) and script (.sh) to /mnt/fusefs 3) Name the .c file "simple.c" and compile it (simple make will do) 4) Run the script inside /mnt/fusefs 5) Wait a few minutes.
Thanks for the test case! I was able to reproduce the panic on 14.0-CURRENT in < 1 minute.
Created attachment 228412 [details] Fix two panics and quiet some warnings Agata, could you please test this patch? It would be good to test it with actual MooseFS, because sshfs+blow_up_kernel.sh is not capable of reaching all paths that might lead to the crash. The patch does three things: * Fixes the reported crash during VOP_BMAP * Fixes the reported crash during VOP_STRATEGY * Consolidates some error checking code, and eliminates the "cache incoherent" warnings in some cases where they were inappropriate.
Hi Alan, I'd love to test it, but which version should I patch? The latest sources from github (git clone --branch main https://git.FreeBSD.org/src.git /usr/src) throw some errors when I dry run the patch: Hmm... Looks like a unified diff to me... The text leading up to this was: -------------------------- |diff --git a/sys/fs/fuse/fuse_internal.c b/sys/fs/fuse/fuse_internal.c |index c52f12d3a71a..ffa4a40095b2 100644 |--- a/sys/fs/fuse/fuse_internal.c |+++ b/sys/fs/fuse/fuse_internal.c -------------------------- Patching file sys/fs/fuse/fuse_internal.c using Plan A... Hunk #1 succeeded at 255. Hunk #2 succeeded at 272. Hunk #3 succeeded at 852. Hunk #4 failed at 958. Hunk #5 succeeded at 1258 (offset -1 lines). 1 out of 5 hunks failed while patching sys/fs/fuse/fuse_internal.c Hmm... The next patch looks like a unified diff to me... The text leading up to this was: -------------------------- |diff --git a/sys/fs/fuse/fuse_internal.h b/sys/fs/fuse/fuse_internal.h |index 20a10d7dfda0..e9fa3857227a 100644 |--- a/sys/fs/fuse/fuse_internal.h |+++ b/sys/fs/fuse/fuse_internal.h -------------------------- Patching file sys/fs/fuse/fuse_internal.h using Plan A... Hunk #1 succeeded at 223. Hmm... The next patch looks like a unified diff to me... The text leading up to this was: -------------------------- |diff --git a/sys/fs/fuse/fuse_io.c b/sys/fs/fuse/fuse_io.c |index 47c7fb97e556..f818fbd52869 100644 |--- a/sys/fs/fuse/fuse_io.c |+++ b/sys/fs/fuse/fuse_io.c -------------------------- Patching file sys/fs/fuse/fuse_io.c using Plan A... Hunk #1 succeeded at 571. Hunk #2 succeeded at 715. Hunk #3 succeeded at 1007. Hmm... The next patch looks like a unified diff to me... The text leading up to this was: -------------------------- |diff --git a/sys/fs/fuse/fuse_node.c b/sys/fs/fuse/fuse_node.c |index 6acd1df29361..c296a3b3c330 100644 |--- a/sys/fs/fuse/fuse_node.c |+++ b/sys/fs/fuse/fuse_node.c -------------------------- Patching file sys/fs/fuse/fuse_node.c using Plan A... Hunk #1 succeeded at 389. Hunk #2 succeeded at 437. Hmm... The next patch looks like a unified diff to me... The text leading up to this was: -------------------------- |diff --git a/sys/fs/fuse/fuse_node.h b/sys/fs/fuse/fuse_node.h |index e5fb13d43bec..dcf5e4047778 100644 |--- a/sys/fs/fuse/fuse_node.h |+++ b/sys/fs/fuse/fuse_node.h -------------------------- Patching file sys/fs/fuse/fuse_node.h using Plan A... Hunk #1 succeeded at 201. Hmm... The next patch looks like a unified diff to me... The text leading up to this was: -------------------------- |diff --git a/sys/fs/fuse/fuse_vfsops.c b/sys/fs/fuse/fuse_vfsops.c |index b188996066d5..3d5e168bcde2 100644 |--- a/sys/fs/fuse/fuse_vfsops.c |+++ b/sys/fs/fuse/fuse_vfsops.c -------------------------- Patching file sys/fs/fuse/fuse_vfsops.c using Plan A... Hunk #1 succeeded at 539. Hunk #2 failed at 575. 1 out of 2 hunks failed while patching sys/fs/fuse/fuse_vfsops.c Hmm... The next patch looks like a unified diff to me... The text leading up to this was: -------------------------- |diff --git a/sys/fs/fuse/fuse_vnops.c b/sys/fs/fuse/fuse_vnops.c |index 4c8c8c1d4461..9aafbad990c5 100644 |--- a/sys/fs/fuse/fuse_vnops.c |+++ b/sys/fs/fuse/fuse_vnops.c -------------------------- Patching file sys/fs/fuse/fuse_vnops.c using Plan A... Hunk #1 succeeded at 516. Hunk #2 succeeded at 551. Hunk #3 succeeded at 907. Hunk #4 succeeded at 1163. Hunk #5 failed at 1372. 1 out of 5 hunks failed while patching sys/fs/fuse/fuse_vnops.c Hmm... The next patch looks like a unified diff to me... The text leading up to this was: -------------------------- |diff --git a/tests/sys/fs/fusefs/bmap.cc b/tests/sys/fs/fusefs/bmap.cc |index 3f392a4447f2..c635f4d7e46f 100644 |--- a/tests/sys/fs/fusefs/bmap.cc |+++ b/tests/sys/fs/fusefs/bmap.cc -------------------------- Patching file tests/sys/fs/fusefs/bmap.cc using Plan A... Hunk #1 succeeded at 75. Hunk #2 succeeded at 167. Hmm... The next patch looks like a unified diff to me... The text leading up to this was: -------------------------- |diff --git a/tests/sys/fs/fusefs/write.cc b/tests/sys/fs/fusefs/write.cc |index fadf4648d31c..db5bb3fe4441 100644 |--- a/tests/sys/fs/fusefs/write.cc |+++ b/tests/sys/fs/fusefs/write.cc -------------------------- Patching file tests/sys/fs/fusefs/write.cc using Plan A... Hunk #1 succeeded at 208. Hunk #2 succeeded at 529. done
Same issue here trying to patch FreeBSD 13.0-p4.
Created attachment 228438 [details] Fix two panics and quiet some warnings Rebased the patch onto main
(In reply to Alan Somers from comment #13) It's not patching cleanly on 13.0 releng. Is it supposed to, or do I need current?
You need current. I haven't tried making a version for 13.0-RELEASE yet, but I can.
(In reply to Alan Somers from comment #15) A patch for 13 release would be great. I have multiple systems to test it across on the release branch.
Created attachment 228446 [details] Patch formatted for releng/13.0 Here's a patch that can apply to releng/13.0. It compiles, but I haven't runtime tested it.
(In reply to Alan Somers from comment #17) I've compiled and runtime tested it on 13.0 release. So far as I can tell, it solves the issue. I've run the MooseFS provided test for over two hours without any crash. This was run on an already loaded system. Side note, it also solves an issue with large 'git clone' commands failing.
The kernel panic doesn't happen anymore on FreeBSD 14 with patched kernel when using MooseFS and the test script.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=032a5bd55b3a003d3560435422a95f27f91685fe commit 032a5bd55b3a003d3560435422a95f27f91685fe Author: Alan Somers <asomers@FreeBSD.org> AuthorDate: 2021-10-03 17:51:14 +0000 Commit: Alan Somers <asomers@FreeBSD.org> CommitDate: 2021-10-06 20:07:33 +0000 fusefs: Fix a bug during VOP_STRATEGY when the server changes file size If the FUSE server tells the kernel that a file's size has changed, then the kernel must invalidate any portion of that file in cache. But the kernel can't do that during VOP_STRATEGY, because the file's buffers are already locked. Instead, proceed with the write. PR: 256937 Reported by: Agata <chogata@moosefs.pro> Tested by: Agata <chogata@moosefs.pro> MFC after: 2 weeks Reviewed by: pfg Differential Revision: https://reviews.freebsd.org/D32332 sys/fs/fuse/fuse_io.c | 19 +++++++---- tests/sys/fs/fusefs/write.cc | 81 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 93 insertions(+), 7 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=7430017b9978cae054ed99e5160f739e5ca021d5 commit 7430017b9978cae054ed99e5160f739e5ca021d5 Author: Alan Somers <asomers@FreeBSD.org> AuthorDate: 2021-10-02 18:17:36 +0000 Commit: Alan Somers <asomers@FreeBSD.org> CommitDate: 2021-10-06 20:07:33 +0000 fusefs: fix a recurse-on-non-recursive lockmgr panic fuse_vnop_bmap needs to know the file's size in order to calculate the optimum amount of readahead. If the file's size is unknown, it must ask the FUSE server. But if the file's data was previously cached and the server reports that its size has shrunk, fusefs must invalidate the cached data. That's not possible during VOP_BMAP because the buffer object is already locked. Fix the panic by not querying the FUSE server for the file's size during VOP_BMAP if we don't need it. That's also a a slight performance optimization. PR: 256937 Reported by: Agata <chogata@moosefs.pro> Tested by: Agata <chogata@moosefs.pro> MFC after: 2 weeks sys/fs/fuse/fuse_vnops.c | 20 ++++++++--- tests/sys/fs/fusefs/bmap.cc | 86 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 102 insertions(+), 4 deletions(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=2ccb87689fdf15b62e1a7d078ccd21db53a07e9f commit 2ccb87689fdf15b62e1a7d078ccd21db53a07e9f Author: Alan Somers <asomers@FreeBSD.org> AuthorDate: 2021-10-02 18:17:36 +0000 Commit: Alan Somers <asomers@FreeBSD.org> CommitDate: 2021-10-12 05:13:58 +0000 fusefs: fix a recurse-on-non-recursive lockmgr panic fuse_vnop_bmap needs to know the file's size in order to calculate the optimum amount of readahead. If the file's size is unknown, it must ask the FUSE server. But if the file's data was previously cached and the server reports that its size has shrunk, fusefs must invalidate the cached data. That's not possible during VOP_BMAP because the buffer object is already locked. Fix the panic by not querying the FUSE server for the file's size during VOP_BMAP if we don't need it. That's also a a slight performance optimization. PR: 256937 Reported by: Agata <chogata@moosefs.pro> Tested by: Agata <chogata@moosefs.pro> (cherry picked from commit 7430017b9978cae054ed99e5160f739e5ca021d5) sys/fs/fuse/fuse_vnops.c | 20 ++++++++--- tests/sys/fs/fusefs/bmap.cc | 86 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 102 insertions(+), 4 deletions(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=955a418bd3036dc055d97cf98faec0be137b3b11 commit 955a418bd3036dc055d97cf98faec0be137b3b11 Author: Alan Somers <asomers@FreeBSD.org> AuthorDate: 2021-10-03 17:51:14 +0000 Commit: Alan Somers <asomers@FreeBSD.org> CommitDate: 2021-10-12 05:41:25 +0000 fusefs: Fix a bug during VOP_STRATEGY when the server changes file size If the FUSE server tells the kernel that a file's size has changed, then the kernel must invalidate any portion of that file in cache. But the kernel can't do that during VOP_STRATEGY, because the file's buffers are already locked. Instead, proceed with the write. PR: 256937 Reported by: Agata <chogata@moosefs.pro> Tested by: Agata <chogata@moosefs.pro> Reviewed by: pfg Differential Revision: https://reviews.freebsd.org/D32332 (cherry picked from commit 032a5bd55b3a003d3560435422a95f27f91685fe) sys/fs/fuse/fuse_io.c | 19 ++++++---- tests/sys/fs/fusefs/write.cc | 82 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 94 insertions(+), 7 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=bdd7a3f2a17c4ed6361f1789a72fb07403e42172 commit bdd7a3f2a17c4ed6361f1789a72fb07403e42172 Author: Alan Somers <asomers@FreeBSD.org> AuthorDate: 2021-10-02 18:17:36 +0000 Commit: Alan Somers <asomers@FreeBSD.org> CommitDate: 2021-12-07 05:08:32 +0000 fusefs: fix a recurse-on-non-recursive lockmgr panic fuse_vnop_bmap needs to know the file's size in order to calculate the optimum amount of readahead. If the file's size is unknown, it must ask the FUSE server. But if the file's data was previously cached and the server reports that its size has shrunk, fusefs must invalidate the cached data. That's not possible during VOP_BMAP because the buffer object is already locked. Fix the panic by not querying the FUSE server for the file's size during VOP_BMAP if we don't need it. That's also a a slight performance optimization. PR: 256937 Reported by: Agata <chogata@moosefs.pro> Tested by: Agata <chogata@moosefs.pro> (cherry picked from commit 7430017b9978cae054ed99e5160f739e5ca021d5) sys/fs/fuse/fuse_vnops.c | 20 ++++++++--- tests/sys/fs/fusefs/bmap.cc | 86 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 102 insertions(+), 4 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=7adb46c70f2e050386422ffdda38ada3b61cd998 commit 7adb46c70f2e050386422ffdda38ada3b61cd998 Author: Alan Somers <asomers@FreeBSD.org> AuthorDate: 2021-10-03 17:51:14 +0000 Commit: Alan Somers <asomers@FreeBSD.org> CommitDate: 2021-12-07 05:14:10 +0000 fusefs: Fix a bug during VOP_STRATEGY when the server changes file size If the FUSE server tells the kernel that a file's size has changed, then the kernel must invalidate any portion of that file in cache. But the kernel can't do that during VOP_STRATEGY, because the file's buffers are already locked. Instead, proceed with the write. PR: 256937 Reported by: Agata <chogata@moosefs.pro> Tested by: Agata <chogata@moosefs.pro> Reviewed by: pfg Differential Revision: https://reviews.freebsd.org/D32332 (cherry picked from commit 032a5bd55b3a003d3560435422a95f27f91685fe) sys/fs/fuse/fuse_io.c | 19 +++++++---- tests/sys/fs/fusefs/write.cc | 81 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 93 insertions(+), 7 deletions(-)