Bug 230962 - Kernel panic when writing extended attributes with soft updates enabled
Summary: Kernel panic when writing extended attributes with soft updates enabled
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.2-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-fs mailing list
URL:
Keywords: panic
Depends on:
Blocks:
 
Reported: 2018-08-27 20:54 UTC by 2t8mr7kx9f
Modified: 2018-12-05 06:49 UTC (History)
7 users (show)

See Also:


Attachments
A screenshot of the panic message. (369.35 KB, image/jpeg)
2018-08-27 20:54 UTC, 2t8mr7kx9f
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description 2t8mr7kx9f 2018-08-27 20:54:57 UTC
Created attachment 196617 [details]
A screenshot of the panic message.

This is a continuation of the following bugreport:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230732

I did some digging to try and get to the root cause. It appears that it has nothing to do with GELI, I can also reproduce this on a plain UFS volume. The culprit seem to be extended attributes.

Freebsd panics reproducible with 

"panic: softdep_deallocate_dependencies: dangling deps" 

when using rsync with the -X option ("preserve extended attributes") to a volume with soft updates enabled.

This appears to be some kind of race condition - transferring the files one-by-one works, the panic occurs only when transferring the whole directory.

Turning off soft updates (-n disable) or using rsync without -X prevents the panic from occurring.

The exact same command worked with 11.1-RELEASE.

The attached screenshot shows the panic message. Please let me know if I can be of further assistance!
Comment 1 intellisun 2018-09-19 10:23:32 UTC
I have the same problem with soft-updates and acls activated on UFS volume.
FreeBSD version 11.2-STABLE, r335822. If I disable either soft-updates or acls, the system behaves normally.
Comment 2 Kirk McKusick freebsd_committer 2018-09-19 14:43:54 UTC
Do either of you have a test case / example that will trigger the panic?
We have not been able to reproduce the problem and so have no way to understand what is causing it.
Comment 3 koro 2018-09-20 07:49:28 UTC
I too am affected by this, it was so bad I had to rollback to 11.1.

I've been trying, to no avail, to reproduce it in a VM.

It definitely happened on a real machine with a real 8TB drive though.

I don't think transferring a whole disk image of this size just for the sake of debugging would be feasible, however if there was a way to make a sparse image that only contained filesystem metadata (but not file data or unused blocks), either with an existing tool or by coding it myself (with some guidance as to how to obtain said list of blocks), I'd be willing to share (privately) such a disk image.
Comment 4 Kirk McKusick freebsd_committer 2018-09-20 15:02:19 UTC
(In reply to koro from comment #3)
We think we have a way to reproduce it, but not yet sure. So hold on to that disk image for now. But hopefully we will not need it. Stay tuned.
Comment 5 koro 2018-12-05 04:14:14 UTC
Is there any progress on this?

With 11.1 not being supported anymore it's becoming harder and harder to stay on it.
Comment 6 Kirk McKusick freebsd_committer 2018-12-05 06:49:29 UTC
(In reply to koro from comment #5)
I do have a way to reproduce it, but have gotten side-tracked on other issues so have not had time to dig into it. I'll try to move it up on my priority list.