Bug 230962 - Kernel panic when writing extended attributes with soft updates enabled
Summary: Kernel panic when writing extended attributes with soft updates enabled
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.2-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-fs mailing list
URL:
Keywords: panic
Depends on:
Blocks:
 
Reported: 2018-08-27 20:54 UTC by 2t8mr7kx9f
Modified: 2019-02-04 21:36 UTC (History)
9 users (show)

See Also:


Attachments
A screenshot of the panic message. (369.35 KB, image/jpeg)
2018-08-27 20:54 UTC, 2t8mr7kx9f
no flags Details
Proposed patch to fix bug. (446 bytes, patch)
2019-01-26 21:46 UTC, Kirk McKusick
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description 2t8mr7kx9f 2018-08-27 20:54:57 UTC
Created attachment 196617 [details]
A screenshot of the panic message.

This is a continuation of the following bugreport:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230732

I did some digging to try and get to the root cause. It appears that it has nothing to do with GELI, I can also reproduce this on a plain UFS volume. The culprit seem to be extended attributes.

Freebsd panics reproducible with 

"panic: softdep_deallocate_dependencies: dangling deps" 

when using rsync with the -X option ("preserve extended attributes") to a volume with soft updates enabled.

This appears to be some kind of race condition - transferring the files one-by-one works, the panic occurs only when transferring the whole directory.

Turning off soft updates (-n disable) or using rsync without -X prevents the panic from occurring.

The exact same command worked with 11.1-RELEASE.

The attached screenshot shows the panic message. Please let me know if I can be of further assistance!
Comment 1 intellisun 2018-09-19 10:23:32 UTC
I have the same problem with soft-updates and acls activated on UFS volume.
FreeBSD version 11.2-STABLE, r335822. If I disable either soft-updates or acls, the system behaves normally.
Comment 2 Kirk McKusick freebsd_committer 2018-09-19 14:43:54 UTC
Do either of you have a test case / example that will trigger the panic?
We have not been able to reproduce the problem and so have no way to understand what is causing it.
Comment 3 koro 2018-09-20 07:49:28 UTC
I too am affected by this, it was so bad I had to rollback to 11.1.

I've been trying, to no avail, to reproduce it in a VM.

It definitely happened on a real machine with a real 8TB drive though.

I don't think transferring a whole disk image of this size just for the sake of debugging would be feasible, however if there was a way to make a sparse image that only contained filesystem metadata (but not file data or unused blocks), either with an existing tool or by coding it myself (with some guidance as to how to obtain said list of blocks), I'd be willing to share (privately) such a disk image.
Comment 4 Kirk McKusick freebsd_committer 2018-09-20 15:02:19 UTC
(In reply to koro from comment #3)
We think we have a way to reproduce it, but not yet sure. So hold on to that disk image for now. But hopefully we will not need it. Stay tuned.
Comment 5 koro 2018-12-05 04:14:14 UTC
Is there any progress on this?

With 11.1 not being supported anymore it's becoming harder and harder to stay on it.
Comment 6 Kirk McKusick freebsd_committer 2018-12-05 06:49:29 UTC
(In reply to koro from comment #5)
I do have a way to reproduce it, but have gotten side-tracked on other issues so have not had time to dig into it. I'll try to move it up on my priority list.
Comment 7 commit-hook freebsd_committer 2018-12-13 06:38:29 UTC
A commit references this bug:

Author: pho
Date: Thu Dec 13 06:37:36 UTC 2018
New revision: 342027
URL: https://svnweb.freebsd.org/changeset/base/342027

Log:
  Added a new test scenario for FFS extended attributes.

  PR:		230962

Changes:
  user/pho/stress2/misc/extattr2.sh
Comment 8 2t8mr7kx9f 2019-01-20 17:10:23 UTC
I can now confirm that this bug still exists in 12.0-RELEASE-p2.

Is there anything we can do to help? The panic currently prevents me from using any of my systems with softupdates and / or journaling enabled, which leads to quite long fsck times in the event of a crash.
Comment 9 Kirk McKusick freebsd_committer 2019-01-21 07:43:34 UTC
Work still in progress on a fix for this bug. Hope to have a test patch soon.
Comment 10 Kirk McKusick freebsd_committer 2019-01-26 21:46:52 UTC
Created attachment 201424 [details]
Proposed patch to fix bug.

Here is my proposed patch to fix this bug. Please let me know if it helps.
Comment 11 commit-hook freebsd_committer 2019-01-28 21:37:01 UTC
A commit references this bug:

Author: mckusick
Date: Mon Jan 28 21:36:46 UTC 2019
New revision: 343536
URL: https://svnweb.freebsd.org/changeset/base/343536

Log:
  This bug was introduced with the change to use softdep_bp_to_mp() in
  January 2018 changes -r327723 and -r327821. The softdep_bp_to_mp()
  function failed to include VFIFO as one of the valid cases.

  Although fifo's do not allocate blocks in the filesystem, they will
  allocate blocks if they use extended attributes (such as ACLs). Thus,
  softdep_bp_to_mp() needs to return a non-NULL mount pointer when
  presented with a fifo vnode so that the soft updates write complete
  will properly process the soft updates structures associated with the
  extended attribute blocks. It was the failure to process these soft
  updates structures, thus leaving them hanging off the buffer, which
  lead to the "panic: softdep_deallocate_dependencies: dangling deps"
  when trying to clean up the buffer after it was written.

  PR:           230962
  Reported by:  2t8mr7kx9f@protonmail.com
  Reviewed by:  kib
  Tested by:    Peter Holm
  MFC after:    1 week
  Sponsored by: Netflix

Changes:
  head/sys/ufs/ffs/ffs_softdep.c
Comment 12 koro 2019-02-02 03:12:55 UTC
I have built a 11.2 kernel with your fix in a VM, transferred it on the affected machine by replacing /boot/kernel and booted it. Userspace was still on 11.1 though.

The system booted fine and all the services started, but as soon as I/O started to pick up, I immediately got the panic again. Given that your patch concerns FIFOs and I don't have any on my filesystems which also make use of POSIX ACLs, I think it might be a different bug.
Comment 13 Conrad Meyer freebsd_committer 2019-02-02 04:56:43 UTC
(In reply to commit-hook from comment #7)
What is the kern.features.ufs_extattr sysctl?  It doesn't seem to exist on CURRENT.
Comment 14 Conrad Meyer freebsd_committer 2019-02-02 05:08:11 UTC
With extattr2.sh, disabling the ufs_extattr feature check, I run into an ffs_truncate3 panic.
Comment 15 Conrad Meyer freebsd_committer 2019-02-02 05:47:28 UTC
(In reply to Conrad Meyer from comment #14)
(On current, not 11.x.)
Comment 16 Peter Holm freebsd_committer 2019-02-02 13:31:50 UTC
(In reply to Conrad Meyer from comment #14)
I can not repeat the original problem.
But I see the ffs_truncate3 panic:
https://people.freebsd.org/~pho/stress/log/extattr2.txt

The check for "ufs extended attribute" does not belong in the extattr2.sh test.
Comment 17 Kirk McKusick freebsd_committer 2019-02-04 21:36:45 UTC
(In reply to koro from comment #12)
Peter Holm has managed to trigger the panic even with the fifo fix, so I am continuing to look at this problem.