Bug 256511

Summary: UFS assertion failure when shutting down from single-user mode
Product: Base System Reporter: Mark Johnston <markj>
Component: kernAssignee: freebsd-fs (Nobody) <fs>
Status: Closed FIXED    
Severity: Affects Only Me CC: emaste, kib, rew
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
fix panic on shutdown none

Description Mark Johnston freebsd_committer freebsd_triage 2021-06-09 19:48:46 UTC
I built a VM image without an fstab.  At the mountroot prompt, I entered the root partition, and the system dropped into single-user mode (because there's no fstab).  While shutting down, I get:

panic: Assertion ump->um_softdep == NULL failed at /root/freebsd/sys/ufs/ffs/ffs_vfsops.c:1541                                                                
cpuid = 0                                                                                                                                                     
time = 1623267977                                                                                                                                             
KDB: stack backtrace:                                                                                                                                         
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0008585820                                                                                
vpanic() at vpanic+0x181/frame 0xfffffe0008585870                                                                                                             
panic() at panic+0x43/frame 0xfffffe00085858d0                                                                                                                
ffs_unmount() at ffs_unmount+0x4b5/frame 0xfffffe0008585940                                                                                                   
dounmount() at dounmount+0x42b/frame 0xfffffe00085859b0                                                                                                       
vfs_unmountall() at vfs_unmountall+0x6a/frame 0xfffffe00085859e0                                                                                              
bufshutdown() at bufshutdown+0x2ce/frame 0xfffffe0008585a30                                                                                                   
kern_reboot() at kern_reboot+0x213/frame 0xfffffe0008585a70                                                                                                   
sys_reboot() at sys_reboot+0x3a6/frame 0xfffffe0008585ac0                                                                                                     
amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe0008585bf0                                                                                               
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0008585bf0                                                                                    
--- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x28c92a, rsp = 0x7fffffffe698, rbp = 0x7fffffffe780 ---
Comment 1 Konstantin Belousov freebsd_committer freebsd_triage 2021-06-10 13:42:30 UTC
Does fs have SU, or SU+J enabled?
The root was ro?

Perhaps do 'dumpfs /dev/<vol>' and show everything up to the start of cyl groups
dump.
Comment 2 Mark Johnston freebsd_committer freebsd_triage 2021-06-10 14:03:22 UTC
(In reply to Konstantin Belousov from comment #1)
It has SU enabled, no journaling.  The mount is ro, yes.

A bit surprising to me, the problem only occurs if I specify "ufs:gpt/rootfs" at the mountroot prompt.  If I enter "ufs:/dev/gpt/rootfs" instead, I cannot trigger a panic.  Perhaps the former syntax is not really valid.

magic   19540119 (UFS2)                                                                                                                                       
last mounted time       Thu Jun 10 14:01:04 2021                                                                                                              
last modified time      Thu Jun 10 14:01:04 2021                                                                                                              
superblock location     65536   id      [ 60c21440 6928a463 ]                                                                                                 
ncg     58      size    13106127        blocks  13096358                                                                                                      
bsize   32768   shift   15      mask    0xffff8000                                                                                                            
fsize   4096    shift   12      mask    0xfffff000                                                                                                            
frag    8       shift   3       fsbtodb 3                                                                                                                     
minfree 8%      optim   time    symlinklen 120                                                                                                                
maxbsize 32768  maxbpg  8192    maxcontig 2     contigsumsize 2                                                                                               
nbfree  1531501 ndir    2384    nifree  114306  nffree  67                                                                                                    
bpg     28689   fpg     229512  ipg     2432    unrefs  0                                                                                                     
nindir  4096    inopb   128     maxfilesize     2252349704110079                                                                                              
sbsize  4096    cgsize  32768   csaddr  192     cssize  4096                                                                                                  
sblkno  24      cblkno  32      iblkno  40      dblkno  192                                                                                                   
cgrotor 0       fmod    0       ronly   0       clean   0                                                                                                     
metaspace 0     avgfpdir 64     avgfilesize 16384                                                                                                             
flags   unclean soft-updates                                                                                                                                  
                                                                                                                                                              
fsmnt   /                                                                                                                                                     
volname VM      swuid   0       providersize    13106127
Comment 3 Konstantin Belousov freebsd_committer freebsd_triage 2021-06-10 14:17:14 UTC
Are you sure that it was the difference between /dev/gpt/root vs gpt/root and not
e.g. clean/dirty state?

Also I probably want to see the output of mount-v in both cases (i.e. panicing
later vs. not panicing)
Comment 4 Robert Wing freebsd_committer freebsd_triage 2021-06-15 13:13:39 UTC
Created attachment 225827 [details]
fix panic on shutdown

There appears to be a couple issues here:
                                                                                                                                          
One issue is /dev/gpt/rootfs vs gpt/rootfs. The second is that the soft dependency code doesn't properly clean up after a namei lookup failure.

At the mountroot prompt, devfs is mounted on the root so the namei lookup                                              for 'gpt/rootfs' works. devfs is then remounted to /dev and the init process begins.                                                                                                                         Then comes time to mount 'gpt/rootfs', but devfs is no longer at the root, so the                                             namei lookup fails. The system then drops into single-user mode with a readonly filesystem.                                                                                                                                                                     

The panic is caused because the soft dependency code gets set up (via softdep_mount()), but doesn't clean up after itself following the namei lookup failure (in ffs_mount()). The lookup failure causes the filesystem to not get mounted 'rw' (i.e., without soft updates). On shutdown, the system doesn't recognize that the filesystem is mounted with soft updates (because it isn't), and skips the clean up code (being softdep_unmount()). By skipping the clean-up clode, ump->um_softdep doesn't get set to NULL and fails on `MPASS()`.                                                                                                                                        

/dev/gpt/rootfs works at the mountprompt because when devfs is mounted on the root, a symlink is also created for '/dev' that points to '/'.                                                                                        


I've attached a patched that fixes the panic - on its own, it doesn't address the real problem.                                                                 
                                                                     
I think the real problem is that if the namei lookup succeeds at the mountprompt, then namei should also succeed when it tries to mount the root filesystem 'rw'. One solution might be to prefix '/dev/' to the device string if its passed to the mountprompt without it..? 
                                                                                                                                                                                                             
Any thoughts?
Comment 5 Konstantin Belousov freebsd_committer freebsd_triage 2021-06-15 13:48:08 UTC
I do not think that the patch is entirely correct.  Imagine that we have rw/SU
mounted volume, which is updated.  Then, if namei fails, you would remove
SU structures from the ufsmount without a reason.  IMO the cleaning of the
um_softdep should only occur if we set it up in the previous MNT_UPDATE {} block.

The problem with the inconsistent names for devfs before/after move  of devfs
to /dev is inherent. You might add something special-cased for e.g. updating
UFS-type volume mounted on /, where "from" mount option does not start with
'/'.  Anything more generic would probably break some cases.

Please put patch into the phabricator instead of attaching it to the bug report.
Comment 6 Robert Wing freebsd_committer freebsd_triage 2021-06-22 22:57:38 UTC
(In reply to Konstantin Belousov from comment #5)

Hmm, I have a question. Can the namei() call go before the MNT_UPDATE {} block and either return early or cache the result from namei() and skip initialization based on that?

Here's why I ask that:

softdep_mount() is what sets up um_softdep and softdep_mount() is only called in the (MNT_UPDATE && upgrade from ro->rw) {} block. Also in that block of code, the superblock gets updated with new values for fs_mtime and fs_ronly. The geom access counters get incremented for exclusive write access. Seems like those would also need to be cleaned up?

I'll put the above idea (patch) on phabricator for comment: https://reviews.freebsd.org/D30870
Comment 7 commit-hook freebsd_committer freebsd_triage 2022-03-07 19:52:36 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=0455cc7104ec8e8dd54b3f44049112a5a8ca329c

commit 0455cc7104ec8e8dd54b3f44049112a5a8ca329c
Author:     Robert Wing <rew@FreeBSD.org>
AuthorDate: 2022-03-07 19:18:03 +0000
Commit:     Robert Wing <rew@FreeBSD.org>
CommitDate: 2022-03-07 19:48:44 +0000

    ffs_mount(): return early if namei() fails to lookup disk device

    With soft updates enabled, an INVARIANTS panic is hit in ffs_unmount().

    The problem occurs in ffs_mount() when upgrading a mount from ro->rw.
    During a mount update, the soft update code gets set up but doesn't get
    cleaned up if namei() fails when looking up the disk device.

    Avoid this scenario by looking up the disk device first and bail early
    if the namei() lookup fails.

    PR:             256511
    MFC After:      2 weeks
    Reviewed by:    mckusick, kib
    Differential Revision:  https://reviews.freebsd.org/D30870

 sys/ufs/ffs/ffs_vfsops.c | 148 +++++++++++++++++++++++------------------------
 1 file changed, 72 insertions(+), 76 deletions(-)
Comment 8 commit-hook freebsd_committer freebsd_triage 2022-04-13 21:04:12 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=d7d8cc9891507ec3d2dcd9334be2afade1ed9d0b

commit d7d8cc9891507ec3d2dcd9334be2afade1ed9d0b
Author:     Robert Wing <rew@FreeBSD.org>
AuthorDate: 2022-03-07 19:18:03 +0000
Commit:     Robert Wing <rew@FreeBSD.org>
CommitDate: 2022-04-13 20:59:12 +0000

    ffs_mount(): return early if namei() fails to lookup disk device

    With soft updates enabled, an INVARIANTS panic is hit in ffs_unmount().

    The problem occurs in ffs_mount() when upgrading a mount from ro->rw.
    During a mount update, the soft update code gets set up but doesn't get
    cleaned up if namei() fails when looking up the disk device.

    Avoid this scenario by looking up the disk device first and bail early
    if the namei() lookup fails.

    PR:             256511
    Reviewed by:    mckusick, kib
    Differential Revision:  https://reviews.freebsd.org/D30870

    (cherry picked from commit 0455cc7104ec8e8dd54b3f44049112a5a8ca329c)

 sys/ufs/ffs/ffs_vfsops.c | 148 +++++++++++++++++++++++------------------------
 1 file changed, 72 insertions(+), 76 deletions(-)