Bug 238883 - Fatal double fault using unionfs to build www/node port (repeatable)
Summary: Fatal double fault using unionfs to build www/node port (repeatable)
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.0-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Jason A. Harmening
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2019-06-29 20:03 UTC by chadf
Modified: 2024-12-12 18:41 UTC (History)
2 users (show)

See Also:
linimon: mfc-stable13?


Attachments
/var/crash/core.txt.4 file (95.27 KB, text/plain)
2019-06-29 20:03 UTC, chadf
no flags Details
/var/crash/core.txt.5 file (98.52 KB, text/plain)
2019-06-29 23:35 UTC, chadf
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description chadf 2019-06-29 20:03:09 UTC
Created attachment 205420 [details]
/var/crash/core.txt.4 file

Repeatably able to panic kernel while building www/node ports while using nullfs and unionfs layered mounts for /usr/ports directory.

To trigger, setup mounts as:

/export/ports on /usr/ports (nullfs, local, read-only)
<above>:/usr/src/local.ports on /usr/ports (unionfs, local)

The /export/ports directory contains a master copy of the ports tree (ufs or zfs doesn't matter - haven't tried nfs).

Then build www/node port:

root@krash:~ # cd /usr/ports/www/node
root@krash:/usr/ports/www/node # make
===>  Building for node-12.4.0
   .
   .
   .

Wait for panic (may take awhile into build). After panic/reboot, restarting build panics again, almost immediately.

I was able to recreate crash in a fresh virtualbox install. The .vdi file for that VM (in a quick to re-panic state) is available upon request (7.3GB uncompressed).

/var/crash/core.txt file attached. vmcore file available (12MB compressed, unattached due to size).
Comment 1 chadf 2019-06-29 23:32:58 UTC
Replaced nullfs with symlink and still panics. Removing nullfs
Comment 2 chadf 2019-06-29 23:34:11 UTC
Replaced nullfs with symlink and still panics. Removed nullfs reference from summary line and adding core.txt.5 attachment.
Comment 3 chadf 2019-06-29 23:35:09 UTC
Created attachment 205426 [details]
/var/crash/core.txt.5 file
Comment 4 Mark Linimon freebsd_committer freebsd_triage 2019-06-30 01:09:01 UTC
The panic and any associated backtrace will be of more use than the core file.
Comment 5 commit-hook freebsd_committer freebsd_triage 2021-06-29 13:02:04 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=372691a7ae1878ecdf707195b0854750f07bf44e

commit 372691a7ae1878ecdf707195b0854750f07bf44e
Author:     Jason A. Harmening <jah@FreeBSD.org>
AuthorDate: 2021-06-12 19:45:18 +0000
Commit:     Jason A. Harmening <jah@FreeBSD.org>
CommitDate: 2021-06-29 13:02:01 +0000

    unionfs: release parent vnodes in deferred context

    Each unionfs node holds a reference to its parent directory vnode.
    A single open file reference can therefore end up keeping an
    arbitrarily deep vnode hierarchy in place.  When that reference is
    released, the resulting VOP_RECLAIM call chain can then exhaust the
    kernel stack.

    This is easily reproducible by running the unionfs.sh stress2 test.
    Fix it by deferring recursive unionfs vnode release to taskqueue
    context.

    PR: 238883
    Reviewed By:    kib (earlier version), markj
    Differential Revision: https://reviews.freebsd.org/D30748

 sys/fs/unionfs/union.h      |  6 ++++-
 sys/fs/unionfs/union_subr.c | 55 ++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 55 insertions(+), 6 deletions(-)
Comment 6 Ed Maste freebsd_committer freebsd_triage 2021-12-14 21:26:54 UTC
Is this issue now resolved (modulo any MFCs)?
Comment 7 Mark Linimon freebsd_committer freebsd_triage 2024-01-10 04:24:24 UTC
^Triage: assign to committer that resolved.

Set flag for possible MFC to 13.
Comment 8 Jason A. Harmening freebsd_committer freebsd_triage 2024-12-12 18:41:43 UTC
The change in question is non-trivial to backport and I no longer have a 13-stable machine or time to test it out at any rate.  If someone else wants to backport the change above then I'd be happy to review the PR.