Bug 245907 - fsck_ufs segfaults with gjournal (SU+J)
Summary: fsck_ufs segfaults with gjournal (SU+J)
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 12.1-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Kirk McKusick
URL:
Keywords:
: 255030 (view as bug list)
Depends on:
Blocks:
 
Reported: 2020-04-25 10:57 UTC by crypt47
Modified: 2021-06-12 17:33 UTC (History)
5 users (show)

See Also:


Attachments
fsck_ffs fix (377 bytes, patch)
2021-05-19 18:21 UTC, Robert Wing
no flags Details | Diff
Proposed bug fix. (766 bytes, patch)
2021-05-19 21:01 UTC, Kirk McKusick
mckusick: maintainer-approval+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description crypt47 2020-04-25 10:57:24 UTC
I use combination of ufs+gjournal volumes

/dev/ada0p2 on / (ufs, local, noatime, journaled soft-updates)
/dev/label/home on /usr/home (ufs, local, gjournal)


and when I outage the VM with reset, fsck successfully checks the FS with SU, then fails on checking the second FS, init drops to shell. Also after manual examination the FS with gjournal log already has a clean status. Below is bootlog where I run fsck with debugger. 

Trying to mount root from ufs:/dev/ada0p2 [rw,noatime]...
uhub1: 8 ports with 8 removable, self powered
Enter passphrase for ada1p1: uhub0: 8 ports with 8 removable, self powered
GEOM_ELI: Device ada1p1.eli created.
GEOM_ELI: Encryption: AES-XTS 256
GEOM_ELI:     Crypto: software
GEOM_JOURNAL: Journal 1212489781: ada1p1.eli contains data.
GEOM_JOURNAL: Journal 1212489781: ada1p1.eli contains journal.
WARNING: / was not properly dismounted
WARNING: /: mount pending error: blocks 32 files 0
GEOM_JOURNAL: Journal ada1p1.eli consistent.
Setting hostuuid: f145b4bd-15db-4390-8ee5-f8675e40513f.
Setting hostid: 0x72fcbc5f.
Starting file system checks:
background with debugger
GNU gdb (GDB) 9.1 [GDB v9.1 for FreeBSD]
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.1".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /sbin/fsck...
(No debugging symbols found in /sbin/fsck)
Starting program: /sbin/fsck -F -p
[Detaching after vfork from child process 53]
[Detaching after vfork from child process 54]
** SU+J Recovering /dev/ada0p2
** Reading 33554432 byte journal from inode 4.
** Building recovery table.
** Resolving unreferenced inode list.
** Processing journal entries.
** 6 journal records in 1024 bytes for 18.75% utilization
** Freed 1 inodes (0 dirs) 0 blocks, and 0 frags.

***** FILE SYSTEM MARKED CLEAN *****
[Detaching after vfork from child process 55]
[Detaching after vfork from child process 56]
pid 56 (fsck_ufs), jid 0, uid 0: exited on signal 11
fsck: /dev/label/home: Segmentation fault
[Inferior 1 (process 52) exited with code 01]
(gdb) bt
No stack.
(gdb) quit
Mounting local filesystems:.
ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib /usr/local/lib/compat/pkg /usr/local/lib/compat/pkg
32-bit compatibility ldconfig path: /usr/lib32
Setting hostname: bewitched.

Please feel free to request for more information.
Comment 1 Conrad Meyer freebsd_committer 2020-04-25 14:12:02 UTC
Thanks for the report.

If possible, can you first save a copy of the offending volumes (/dev/label/home and corresponding gjournal), then try running 'gdb91 fsck_ufs -d -p /dev/label/home'?  If the filesystem is now "clean" and the problem can't be readily reproduced, I'm not sure how much we can do.
Comment 2 crypt47 2020-04-26 08:04:41 UTC
(In reply to Conrad Meyer from comment #1)

Hello Conrad. The problem is 100% reproducible, but could you please what do you mean by 'save a copy'. Did you want me to make a copy of the device file???

Also gdb doesn't run exactly this way, but I hope I got you right. So here is the execution of fsck_ufs with parameters you asked:

WARNING: / was not properly dismounted
WARNING: /: mount pending error: blocks 0 files 1
GEOM_JOURNAL: Journal ada1p1.eli consistent.
Setting hostuuid: f145b4bd-15db-4390-8ee5-f8675e40513f.
Setting hostid: 0x72fcbc5f.
Starting file system checks:
background with debugger
GNU gdb (GDB) 9.1 [GDB v9.1 for FreeBSD]
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.1".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /sbin/fsck_ufs...
(No debugging symbols found in /sbin/fsck_ufs)
Starting program: /sbin/fsck_ufs -d -p /dev/label/home
/dev/label/home: starting

Program received signal SIGSEGV, Segmentation fault.
0x00000008003917da in free () from /lib/libc.so.7
(gdb) bt
#0  0x00000008003917da in free () from /lib/libc.so.7
#1  0x000000000021b6f2 in ?? ()
#2  0x000000000021b651 in ?? ()
#3  0x000000000020f665 in ?? ()
#4  0x000000000020810f in ?? ()
#5  0x0000000800246000 in ?? ()
#6  0x0000000000000000 in ?? ()
(gdb) quit
A debugging session is active.

	Inferior 1 [process 52] will be killed.

Quit anyway? (y or n) y
mount: /dev/ada0p2: R/W mount of / denied. Filesystem is not clean - run fsck. Forced mount will invalidate journal contents: Operation not permitted
Mounting root filesystem rw failed, startup aborted
ERROR: ABORTING BOOT (sending SIGTERM to parent)!
2020-04-26T21:58:59.255639+07:00  init 1 - - /bin/sh on /etc/rc terminated abnormally, going to single user mode
Enter full pathname of shell or RETURN for /bin/sh: # fsck -p
** SU+J Recovering /dev/ada0p2
** Reading 33554432 byte journal from inode 4.
** Building recovery table.
** Resolving unreferenced inode list.
** Processing journal entries.
** 6 journal records in 1024 bytes for 18.75% utilization
** Freed 2 inodes (0 dirs) 0 blocks, and 0 frags.

***** FILE SYSTEM MARKED CLEAN *****
/dev/label/home: FILE SYSTEM CLEAN; SKIPPING CHECKS
# ^DSetting hostuuid: f145b4bd-15db-4390-8ee5-f8675e40513f.
Setting hostid: 0x72fcbc5f.
Fast boot: skipping disk checks.
Mounting local filesystems:.
Comment 3 crypt47 2020-04-26 08:42:44 UTC
Here I've installed debugging symbols and change command to fsck_ffs.

Starting program: /sbin/fsck_ffs -d -p /dev/label/home
warning: the debug information found in "/usr/lib/debug//lib/libc.so.7.debug" does not match "/lib/libc.so.7" (CRC mismatch).

/dev/label/home: starting

Program received signal SIGSEGV, Segmentation fault.
0x00000008003917da in free () from /lib/libc.so.7
#0  0x00000008003917da in free () from /lib/libc.so.7
#1  0x000000000021b6f2 in closedisk ()
    at /usr/src/sbin/fsck_ffs/gjournal.c:250
--Type <RET> for more, q to quit, c to continue without paging-- \^H \^H
#2  0x000000000021b651 in gjournal_check (filesys=<optimized out>)
    at /usr/src/sbin/fsck_ffs/gjournal.c:504
#3  0x000000000020f665 in checkfilesys (
    filesys=0x7fffffffef15 "/dev/label/home")
    at /usr/src/sbin/fsck_ffs/main.c:307
#4  main (argc=1, argv=0x7fffffffed08) at /usr/src/sbin/fsck_ffs/main.c:205
A debugging session is active.

	Inferior 1 [process 52] will be killed.

Quit anyway? (y or n) y
Comment 4 crypt47 2020-04-30 06:09:40 UTC
Hello Conrad,

Is it alright with this information?

Thanks.
Comment 5 nvass 2021-04-19 08:58:10 UTC
With the following script you can trigger the problem:

#!/bin/sh
diskfile=`mktemp`
truncate -s 10G $diskfile
mdconfig -u 100 -f $diskfile
disk=/dev/md100

gjournal load
gjournal label ${disk}

newfs -J ${disk}.journal

yes q | fsdb ${disk}.journal

fsck -p /dev/md100.journal
Comment 6 Robert Wing freebsd_committer 2021-05-19 18:21:32 UTC
Created attachment 225092 [details]
fsck_ffs fix

Proposed patch, satisfies the previously provided reproducer.

The segfault was being hit in ckfini() (sbin/fsck_ffs/fsutil.c) while attempting to traverse the buffer cache. 

The tail queue used for the buffer cache isn't initialized before dropping into gjournal_check().

The buffer cache appears to be unused when doing an `fsck_ffs -p` on a gjournaled filesystem.
Comment 7 Kirk McKusick freebsd_committer 2021-05-19 21:01:38 UTC
Created attachment 225097 [details]
Proposed bug fix.

(In reply to Robert Wing from comment #6)
You have correctly identified the problem. My change simply moves the bufinit() even earlier to ensure that other early calls to ckfini() do not fail in the same way.
Comment 8 Robert Wing freebsd_committer 2021-05-20 19:55:40 UTC
*** Bug 255030 has been marked as a duplicate of this bug. ***
Comment 9 Robert Wing freebsd_committer 2021-05-20 20:44:13 UTC
Thanks for the feedback, guess I should have noticed that.

Should I open a review/commit or you got it?
Comment 10 Kirk McKusick freebsd_committer 2021-05-21 05:52:08 UTC
(In reply to Robert Wing from comment #9)
If you just note here that you have tried the fix and that it works then I will cite that in the commit. No need to go through a review cycle.
Comment 11 Robert Wing freebsd_committer 2021-05-21 20:04:44 UTC
I tested your patch and it fixes the provided test program.
Comment 12 Kirk McKusick freebsd_committer 2021-05-21 20:48:36 UTC
Committed patch and correctly cited PR 255030 but mis-cited PR 255979 instead of this one. Will at least get this one cited correctly when I do the MFC in 3 days time.
Comment 13 commit-hook freebsd_committer 2021-06-03 02:33:04 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=441e69e419effac0225a45f4cdb948280b8ce5ab

commit 441e69e419effac0225a45f4cdb948280b8ce5ab
Author:     Robert Wing <rew@FreeBSD.org>
AuthorDate: 2021-06-03 01:41:31 +0000
Commit:     Robert Wing <rew@FreeBSD.org>
CommitDate: 2021-06-03 02:30:20 +0000

    fsck_ufs: fix segfault with gjournal

    The segfault was being hit in ckfini() (sbin/fsck_ffs/fsutil.c) while
    attempting to traverse the buffer cache. The tail queue used for the
    buffer cache was not initialized before dropping into gjournal_check().

    Initialize the buffer cache before calling gjournal_check().

    PR:             245907
    Reviewed by:    jhb, mckusick
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D30537

 sbin/fsck_ffs/main.c | 1 +
 1 file changed, 1 insertion(+)
Comment 14 crypt47 2021-06-06 11:52:16 UTC
Thanks for fixing. I usually get updates with freebsd-update. In what update this fix will come? In 12.2-RELEASE-pY or in 12.X-RELEASE? (Same question for F13).
Comment 15 commit-hook freebsd_committer 2021-06-11 17:00:23 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=47ec64b3e801cbb793ccff49d21bc8eeb219ad9f

commit 47ec64b3e801cbb793ccff49d21bc8eeb219ad9f
Author:     Robert Wing <rew@FreeBSD.org>
AuthorDate: 2021-06-03 01:41:31 +0000
Commit:     Robert Wing <rew@FreeBSD.org>
CommitDate: 2021-06-11 16:56:00 +0000

    fsck_ufs: fix segfault with gjournal

    The segfault was being hit in ckfini() (sbin/fsck_ffs/fsutil.c) while
    attempting to traverse the buffer cache. The tail queue used for the
    buffer cache was not initialized before dropping into gjournal_check().

    Initialize the buffer cache before calling gjournal_check().

    PR:             245907
    Reviewed by:    jhb, mckusick
    Differential Revision:  https://reviews.freebsd.org/D30537

    (cherry picked from commit 441e69e419effac0225a45f4cdb948280b8ce5ab)

 sbin/fsck_ffs/main.c | 1 +
 1 file changed, 1 insertion(+)