Bug 245907 - fsck_ufs segfaults with gjournal (SU+J)
Summary: fsck_ufs segfaults with gjournal (SU+J)
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 12.1-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Kirk McKusick
URL:
Keywords:
: 255030 (view as bug list)
Depends on:
Blocks:
 
Reported: 2020-04-25 10:57 UTC by crypt47
Modified: 2022-01-25 06:06 UTC (History)
6 users (show)

See Also:


Attachments
fsck_ffs fix (377 bytes, patch)
2021-05-19 18:21 UTC, Robert Wing
no flags Details | Diff
Proposed bug fix. (766 bytes, patch)
2021-05-19 21:01 UTC, Kirk McKusick
mckusick: maintainer-approval+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description crypt47 2020-04-25 10:57:24 UTC
I use combination of ufs+gjournal volumes

/dev/ada0p2 on / (ufs, local, noatime, journaled soft-updates)
/dev/label/home on /usr/home (ufs, local, gjournal)


and when I outage the VM with reset, fsck successfully checks the FS with SU, then fails on checking the second FS, init drops to shell. Also after manual examination the FS with gjournal log already has a clean status. Below is bootlog where I run fsck with debugger. 

Trying to mount root from ufs:/dev/ada0p2 [rw,noatime]...
uhub1: 8 ports with 8 removable, self powered
Enter passphrase for ada1p1: uhub0: 8 ports with 8 removable, self powered
GEOM_ELI: Device ada1p1.eli created.
GEOM_ELI: Encryption: AES-XTS 256
GEOM_ELI:     Crypto: software
GEOM_JOURNAL: Journal 1212489781: ada1p1.eli contains data.
GEOM_JOURNAL: Journal 1212489781: ada1p1.eli contains journal.
WARNING: / was not properly dismounted
WARNING: /: mount pending error: blocks 32 files 0
GEOM_JOURNAL: Journal ada1p1.eli consistent.
Setting hostuuid: f145b4bd-15db-4390-8ee5-f8675e40513f.
Setting hostid: 0x72fcbc5f.
Starting file system checks:
background with debugger
GNU gdb (GDB) 9.1 [GDB v9.1 for FreeBSD]
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.1".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /sbin/fsck...
(No debugging symbols found in /sbin/fsck)
Starting program: /sbin/fsck -F -p
[Detaching after vfork from child process 53]
[Detaching after vfork from child process 54]
** SU+J Recovering /dev/ada0p2
** Reading 33554432 byte journal from inode 4.
** Building recovery table.
** Resolving unreferenced inode list.
** Processing journal entries.
** 6 journal records in 1024 bytes for 18.75% utilization
** Freed 1 inodes (0 dirs) 0 blocks, and 0 frags.

***** FILE SYSTEM MARKED CLEAN *****
[Detaching after vfork from child process 55]
[Detaching after vfork from child process 56]
pid 56 (fsck_ufs), jid 0, uid 0: exited on signal 11
fsck: /dev/label/home: Segmentation fault
[Inferior 1 (process 52) exited with code 01]
(gdb) bt
No stack.
(gdb) quit
Mounting local filesystems:.
ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib /usr/local/lib/compat/pkg /usr/local/lib/compat/pkg
32-bit compatibility ldconfig path: /usr/lib32
Setting hostname: bewitched.

Please feel free to request for more information.
Comment 1 Conrad Meyer freebsd_committer freebsd_triage 2020-04-25 14:12:02 UTC
Thanks for the report.

If possible, can you first save a copy of the offending volumes (/dev/label/home and corresponding gjournal), then try running 'gdb91 fsck_ufs -d -p /dev/label/home'?  If the filesystem is now "clean" and the problem can't be readily reproduced, I'm not sure how much we can do.
Comment 2 crypt47 2020-04-26 08:04:41 UTC
(In reply to Conrad Meyer from comment #1)

Hello Conrad. The problem is 100% reproducible, but could you please what do you mean by 'save a copy'. Did you want me to make a copy of the device file???

Also gdb doesn't run exactly this way, but I hope I got you right. So here is the execution of fsck_ufs with parameters you asked:

WARNING: / was not properly dismounted
WARNING: /: mount pending error: blocks 0 files 1
GEOM_JOURNAL: Journal ada1p1.eli consistent.
Setting hostuuid: f145b4bd-15db-4390-8ee5-f8675e40513f.
Setting hostid: 0x72fcbc5f.
Starting file system checks:
background with debugger
GNU gdb (GDB) 9.1 [GDB v9.1 for FreeBSD]
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.1".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /sbin/fsck_ufs...
(No debugging symbols found in /sbin/fsck_ufs)
Starting program: /sbin/fsck_ufs -d -p /dev/label/home
/dev/label/home: starting

Program received signal SIGSEGV, Segmentation fault.
0x00000008003917da in free () from /lib/libc.so.7
(gdb) bt
#0  0x00000008003917da in free () from /lib/libc.so.7
#1  0x000000000021b6f2 in ?? ()
#2  0x000000000021b651 in ?? ()
#3  0x000000000020f665 in ?? ()
#4  0x000000000020810f in ?? ()
#5  0x0000000800246000 in ?? ()
#6  0x0000000000000000 in ?? ()
(gdb) quit
A debugging session is active.

	Inferior 1 [process 52] will be killed.

Quit anyway? (y or n) y
mount: /dev/ada0p2: R/W mount of / denied. Filesystem is not clean - run fsck. Forced mount will invalidate journal contents: Operation not permitted
Mounting root filesystem rw failed, startup aborted
ERROR: ABORTING BOOT (sending SIGTERM to parent)!
2020-04-26T21:58:59.255639+07:00  init 1 - - /bin/sh on /etc/rc terminated abnormally, going to single user mode
Enter full pathname of shell or RETURN for /bin/sh: # fsck -p
** SU+J Recovering /dev/ada0p2
** Reading 33554432 byte journal from inode 4.
** Building recovery table.
** Resolving unreferenced inode list.
** Processing journal entries.
** 6 journal records in 1024 bytes for 18.75% utilization
** Freed 2 inodes (0 dirs) 0 blocks, and 0 frags.

***** FILE SYSTEM MARKED CLEAN *****
/dev/label/home: FILE SYSTEM CLEAN; SKIPPING CHECKS
# ^DSetting hostuuid: f145b4bd-15db-4390-8ee5-f8675e40513f.
Setting hostid: 0x72fcbc5f.
Fast boot: skipping disk checks.
Mounting local filesystems:.
Comment 3 crypt47 2020-04-26 08:42:44 UTC
Here I've installed debugging symbols and change command to fsck_ffs.

Starting program: /sbin/fsck_ffs -d -p /dev/label/home
warning: the debug information found in "/usr/lib/debug//lib/libc.so.7.debug" does not match "/lib/libc.so.7" (CRC mismatch).

/dev/label/home: starting

Program received signal SIGSEGV, Segmentation fault.
0x00000008003917da in free () from /lib/libc.so.7
#0  0x00000008003917da in free () from /lib/libc.so.7
#1  0x000000000021b6f2 in closedisk ()
    at /usr/src/sbin/fsck_ffs/gjournal.c:250
--Type <RET> for more, q to quit, c to continue without paging-- \^H \^H
#2  0x000000000021b651 in gjournal_check (filesys=<optimized out>)
    at /usr/src/sbin/fsck_ffs/gjournal.c:504
#3  0x000000000020f665 in checkfilesys (
    filesys=0x7fffffffef15 "/dev/label/home")
    at /usr/src/sbin/fsck_ffs/main.c:307
#4  main (argc=1, argv=0x7fffffffed08) at /usr/src/sbin/fsck_ffs/main.c:205
A debugging session is active.

	Inferior 1 [process 52] will be killed.

Quit anyway? (y or n) y
Comment 4 crypt47 2020-04-30 06:09:40 UTC
Hello Conrad,

Is it alright with this information?

Thanks.
Comment 5 nvass 2021-04-19 08:58:10 UTC
With the following script you can trigger the problem:

#!/bin/sh
diskfile=`mktemp`
truncate -s 10G $diskfile
mdconfig -u 100 -f $diskfile
disk=/dev/md100

gjournal load
gjournal label ${disk}

newfs -J ${disk}.journal

yes q | fsdb ${disk}.journal

fsck -p /dev/md100.journal
Comment 6 Robert Wing freebsd_committer freebsd_triage 2021-05-19 18:21:32 UTC
Created attachment 225092 [details]
fsck_ffs fix

Proposed patch, satisfies the previously provided reproducer.

The segfault was being hit in ckfini() (sbin/fsck_ffs/fsutil.c) while attempting to traverse the buffer cache. 

The tail queue used for the buffer cache isn't initialized before dropping into gjournal_check().

The buffer cache appears to be unused when doing an `fsck_ffs -p` on a gjournaled filesystem.
Comment 7 Kirk McKusick freebsd_committer freebsd_triage 2021-05-19 21:01:38 UTC
Created attachment 225097 [details]
Proposed bug fix.

(In reply to Robert Wing from comment #6)
You have correctly identified the problem. My change simply moves the bufinit() even earlier to ensure that other early calls to ckfini() do not fail in the same way.
Comment 8 Robert Wing freebsd_committer freebsd_triage 2021-05-20 19:55:40 UTC
*** Bug 255030 has been marked as a duplicate of this bug. ***
Comment 9 Robert Wing freebsd_committer freebsd_triage 2021-05-20 20:44:13 UTC
Thanks for the feedback, guess I should have noticed that.

Should I open a review/commit or you got it?
Comment 10 Kirk McKusick freebsd_committer freebsd_triage 2021-05-21 05:52:08 UTC
(In reply to Robert Wing from comment #9)
If you just note here that you have tried the fix and that it works then I will cite that in the commit. No need to go through a review cycle.
Comment 11 Robert Wing freebsd_committer freebsd_triage 2021-05-21 20:04:44 UTC
I tested your patch and it fixes the provided test program.
Comment 12 Kirk McKusick freebsd_committer freebsd_triage 2021-05-21 20:48:36 UTC
Committed patch and correctly cited PR 255030 but mis-cited PR 255979 instead of this one. Will at least get this one cited correctly when I do the MFC in 3 days time.
Comment 13 commit-hook freebsd_committer freebsd_triage 2021-06-03 02:33:04 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=441e69e419effac0225a45f4cdb948280b8ce5ab

commit 441e69e419effac0225a45f4cdb948280b8ce5ab
Author:     Robert Wing <rew@FreeBSD.org>
AuthorDate: 2021-06-03 01:41:31 +0000
Commit:     Robert Wing <rew@FreeBSD.org>
CommitDate: 2021-06-03 02:30:20 +0000

    fsck_ufs: fix segfault with gjournal

    The segfault was being hit in ckfini() (sbin/fsck_ffs/fsutil.c) while
    attempting to traverse the buffer cache. The tail queue used for the
    buffer cache was not initialized before dropping into gjournal_check().

    Initialize the buffer cache before calling gjournal_check().

    PR:             245907
    Reviewed by:    jhb, mckusick
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D30537

 sbin/fsck_ffs/main.c | 1 +
 1 file changed, 1 insertion(+)
Comment 14 crypt47 2021-06-06 11:52:16 UTC
Thanks for fixing. I usually get updates with freebsd-update. In what update this fix will come? In 12.2-RELEASE-pY or in 12.X-RELEASE? (Same question for F13).
Comment 15 commit-hook freebsd_committer freebsd_triage 2021-06-11 17:00:23 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=47ec64b3e801cbb793ccff49d21bc8eeb219ad9f

commit 47ec64b3e801cbb793ccff49d21bc8eeb219ad9f
Author:     Robert Wing <rew@FreeBSD.org>
AuthorDate: 2021-06-03 01:41:31 +0000
Commit:     Robert Wing <rew@FreeBSD.org>
CommitDate: 2021-06-11 16:56:00 +0000

    fsck_ufs: fix segfault with gjournal

    The segfault was being hit in ckfini() (sbin/fsck_ffs/fsutil.c) while
    attempting to traverse the buffer cache. The tail queue used for the
    buffer cache was not initialized before dropping into gjournal_check().

    Initialize the buffer cache before calling gjournal_check().

    PR:             245907
    Reviewed by:    jhb, mckusick
    Differential Revision:  https://reviews.freebsd.org/D30537

    (cherry picked from commit 441e69e419effac0225a45f4cdb948280b8ce5ab)

 sbin/fsck_ffs/main.c | 1 +
 1 file changed, 1 insertion(+)
Comment 16 longwitz 2021-11-09 11:43:57 UTC
The problem for bug 245907 was reported for V12 but the commit solving the problem was done for V13. Because the reason for the segfault of fsck_ffs is different in V12 and V13 we should add a commit for V12 with the patch

--- gjournal.c.1st      2020-04-09 22:35:35.000000000 +0200
+++ gjournal.c  2021-11-09 12:28:09.171617000 +0100
@@ -246,7 +246,6 @@
                err(1, "sbwrite(%s)", devnam);
        if (ufs_disk_close(diskp) == -1)
                err(1, "ufs_disk_close(%s)", devnam);
-       free(diskp);
        diskp = NULL;
        fs = NULL;
 }

The free() is harmful in V12 because there was not a corresponding malloc().

Andreas
Comment 17 Robert Wing freebsd_committer freebsd_triage 2021-11-18 20:51:51 UTC
Andreas,

Thanks for the report. I dropped the ball fixing the original bug.

Originally, I reproduced this bug on a 13 machine and fixed it. I didn't merge the changes back to 12 because I didn't recognize that the problem existed in 12, despite the bug report targeting 12.1 - my oversight.

The reproducer script triggers a similar bug for 12 and 13, but the fixes are different.

I believe your patch is the proper fix (for 12) as diskp points to a global variable (struct uufsd disk) that is not a pointer.

In short, closedisk() was trying to free a non-pointer.
Comment 18 commit-hook freebsd_committer freebsd_triage 2021-11-20 18:34:05 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=f8145bd4bcc025fbb1750d9daf75e92916db4192

commit f8145bd4bcc025fbb1750d9daf75e92916db4192
Author:     Robert Wing <rew@FreeBSD.org>
AuthorDate: 2021-06-03 01:41:31 +0000
Commit:     Robert Wing <rew@FreeBSD.org>
CommitDate: 2021-11-20 18:24:06 +0000

    fsck_ufs: fix segfault with gjournal

    The segfault was being hit in closedisk() while trying to free a
    non-pointer.

    The fix for this bug differs between stable/12 and stable/13.

    PR:             245907
    Submitted by:   longwitz@incore.de
    Differential Revision:  https://reviews.freebsd.org/D30537

    (cherry picked from commit 441e69e419effac0225a45f4cdb948280b8ce5ab)

 sbin/fsck_ffs/gjournal.c | 1 -
 1 file changed, 1 deletion(-)
Comment 19 crypt47 2022-01-22 08:52:31 UTC
Hello, for some reason I experience this bug in F12.3. Is this the way it should be?
Comment 20 Robert Wing freebsd_committer freebsd_triage 2022-01-24 14:07:45 UTC
(In reply to crypt47 from comment #19)

Yes, that is expected - the fix wasn't MFC'ed in time for 12.3.
Comment 21 Kirk McKusick freebsd_committer freebsd_triage 2022-01-25 02:35:20 UTC
(In reply to crypt47 from comment #19)
This bug has been fixed in 12-STABLE but has not been issued as an errata. Unless an errata is issued for it, it will not appear in a FreeBSD12.3-RELEASE. Rather you will have to wait for the FreeBSD12.4 release. In the meantime you can recompile your fsck_ffs program with the one-line fix described in comment #16 or grab a copy of 12-STABLE and copy out the /sbin/fsck_ffs binary to your system (note that /sbin/fsck_ffs is also linked to /sbin/fsck_ufs and /sbin/fsck_4.2bsd).
Comment 22 crypt47 2022-01-25 06:06:32 UTC
(In reply to Kirk McKusick from comment #21)

I see,  I'm still trying to figure out the route of the fix, that's why asking. Ok, so patch levels are errata in course of STABLE release. Thanks.