Bug 278828 - panic when writing to geli device after running attach twice
Summary: panic when writing to geli device after running attach twice
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Many People
Assignee: Mariusz Zaborski
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2024-05-07 08:44 UTC by Andre Albsmeier
Modified: 2024-05-23 08:07 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andre Albsmeier 2024-05-07 08:44:52 UTC
kernel panics if we have no geli devices attached, attach a device, try to attach it again (which fails) and then write to (the attached) .eli.

Steps to reproduce:

kldload geom_eli.ko
dd if=/dev/zero of=/tmp/testcrash bs=64k count=160
mdconfig -a -t vnode -f /tmp/testcrash
geli init -P -K /bin/ls /dev/md0
geli attach -k /bin/ls -p /dev/md0
geli attach -k /bin/ls -p /dev/md0      (this fails)
newfs /dev/md0.eli                      !!! Crash !!!


The second "geli attach" makes g_eli_create() die at this point:

        ...
        dcw = (sc->sc_flags & G_ELI_FLAG_RO) ? 0 : 1;
        error = g_access(cp, 1, dcw, 1);
        if (error != 0) {
                ...
                goto failed;
        }
        ...

        ...
        g_eli_init_uma();
        ...

failed:
        ...
        g_eli_fini_uma();
        ...

g_eli_fini_uma() will find geli_devs being set to 1 (because we already
have a device attached but have not called g_eli_init_uma() in this
context), decrements it and calls g_eli_destroy_uma().

So we will be left with g_eli_uma being NULL despite the fact that we
still have a device attached.

I don't know the correct fix. I changed g_eli_create() so it calls
g_eli_fini_uma() only if it has called g_eli_init_uma() before. This
seems to work...
Comment 1 Mariusz Zaborski freebsd_committer freebsd_triage 2024-05-16 20:19:34 UTC
Hello Andre,

Thank you for the report. I have created a patch for it: https://reviews.freebsd.org/D45225
Comment 2 commit-hook freebsd_committer freebsd_triage 2024-05-19 12:55:24 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=4b3141f5d5373989598f9447ab5a9f87e2d1c9fb

commit 4b3141f5d5373989598f9447ab5a9f87e2d1c9fb
Author:     Mariusz Zaborski <oshogbo@FreeBSD.org>
AuthorDate: 2024-05-19 12:53:17 +0000
Commit:     Mariusz Zaborski <oshogbo@FreeBSD.org>
CommitDate: 2024-05-19 12:53:17 +0000

    geli: allocate a UMA pool earlier

    The functions g_eli_init_uma and g_eli_fini_uma are used to trace
    the number of devices in GELI. There is an issue where the g_eli_create
    function may fail before g_eli_init_uma is called, however
    g_eli_fini_uma is still executed in the fail path. This can
    incorrectly decrease the device count to zero, potentially leading to
    the UMA pool being freed. Accessing the device after the pool has been
    freed causes a system panic.

    This commit resolves the issue by ensuring devices count is increassed
    eariler.

    PR:             278828
    Reported by:    Andre Albsmeier <mail@fbsd2.e4m.org>
    Reviewed by:    asomers
    MFC after:      3 days
    Differential Revision:  https://reviews.freebsd.org/D45225

 sys/geom/eli/g_eli.c                    |  4 +++-
 tests/sys/geom/class/eli/attach_test.sh | 29 +++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 1 deletion(-)
Comment 3 Andriy Gapon freebsd_committer freebsd_triage 2024-05-19 19:44:43 UTC
Just want to add that I recently had a similar crash when I tried to geli attach, by accident, a device that was already attached (and its filesystem mounted).

--- trap 0xc, rip = 0xffffffff80baab46, rsp = 0xfffffe01bfdf1da0, rbp = 0xfffffe01bfdf1dd0 ---
uma_zalloc_arg() at 0xffffffff80baab46 = uma_zalloc_arg+0x16/frame 0xfffffe01bfdf1dd0
g_eli_alloc_data() at 0xffffffff87efb09d = g_eli_alloc_data+0x3d/frame 0xfffffe01bfdf1df0
g_eli_crypto_run() at 0xffffffff87f04067 = g_eli_crypto_run+0x97/frame 0xfffffe01bfdf1e90
g_eli_worker() at 0xffffffff87efcf75 = g_eli_worker+0x375/frame 0xfffffe01bfdf1ef0
fork_exit() at 0xffffffff808b7d87 = fork_exit+0xc7/frame 0xfffffe01bfdf1f30
fork_trampoline() at 0xffffffff80c2ecfe = fork_trampoline+0xe/frame 0xfffffe01bfdf1f30

#9  <signal handler called>
No locals.
#10 uma_zalloc_arg (zone=0x0, udata=udata@entry=0x0, flags=1)
    at /usr/home/avg/devel/freebsd-src-new/machines/trant/sys/vm/uma_core.c:3712
        __pc = 0x0
        bucket = <optimized out>
        cache = <optimized out>
        item = <optimized out>
#11 0xffffffff87efb09d in uma_zalloc (zone=0x0, zone@entry=0xfffff8038a13a8d0,
    flags=1)
    at /usr/home/avg/devel/freebsd-src-new/machines/trant/sys/vm/uma.h:367
No locals.
#12 g_eli_alloc_data (bp=bp@entry=0xfffff8038a13a8d0, sz=32768)
    at /usr/home/avg/devel/freebsd-src-new/machines/trant/sys/geom/eli/g_eli.c:958
No locals.
Comment 4 Andre Albsmeier 2024-05-22 16:31:34 UTC
Just tried your patch on 14-STABLE and latest 14.1 and it worked...
Comment 5 Mariusz Zaborski freebsd_committer freebsd_triage 2024-05-23 06:00:20 UTC
Thanks for testing!
Comment 6 commit-hook freebsd_committer freebsd_triage 2024-05-23 06:00:55 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=ea5a708625e5b5c89333e3e3e9a48fe588f05361

commit ea5a708625e5b5c89333e3e3e9a48fe588f05361
Author:     Mariusz Zaborski <oshogbo@FreeBSD.org>
AuthorDate: 2024-05-19 12:53:17 +0000
Commit:     Mariusz Zaborski <oshogbo@FreeBSD.org>
CommitDate: 2024-05-23 06:00:57 +0000

    geli: allocate a UMA pool earlier

    The functions g_eli_init_uma and g_eli_fini_uma are used to trace
    the number of devices in GELI. There is an issue where the g_eli_create
    function may fail before g_eli_init_uma is called, however
    g_eli_fini_uma is still executed in the fail path. This can
    incorrectly decrease the device count to zero, potentially leading to
    the UMA pool being freed. Accessing the device after the pool has been
    freed causes a system panic.

    This commit resolves the issue by ensuring devices count is increassed
    eariler.

    PR:             278828
    Reported by:    Andre Albsmeier <mail@fbsd2.e4m.org>
    Reviewed by:    asomers
    MFC after:      3 days
    Differential Revision:  https://reviews.freebsd.org/D45225

    (cherry picked from commit 4b3141f5d5373989598f9447ab5a9f87e2d1c9fb)

 sys/geom/eli/g_eli.c                    |  4 +++-
 tests/sys/geom/class/eli/attach_test.sh | 29 +++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 1 deletion(-)
Comment 7 commit-hook freebsd_committer freebsd_triage 2024-05-23 08:07:09 UTC
A commit in branch releng/14.1 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=309946854fd5cb24283cf7a36c9d20c2f2820df7

commit 309946854fd5cb24283cf7a36c9d20c2f2820df7
Author:     Mariusz Zaborski <oshogbo@FreeBSD.org>
AuthorDate: 2024-05-19 12:53:17 +0000
Commit:     Mariusz Zaborski <oshogbo@FreeBSD.org>
CommitDate: 2024-05-23 08:07:19 +0000

    geli: allocate a UMA pool earlier

    The functions g_eli_init_uma and g_eli_fini_uma are used to trace
    the number of devices in GELI. There is an issue where the g_eli_create
    function may fail before g_eli_init_uma is called, however
    g_eli_fini_uma is still executed in the fail path. This can
    incorrectly decrease the device count to zero, potentially leading to
    the UMA pool being freed. Accessing the device after the pool has been
    freed causes a system panic.

    This commit resolves the issue by ensuring devices count is increassed
    eariler.

    PR:             278828
    Reported by:    Andre Albsmeier <mail@fbsd2.e4m.org>
    Reviewed by:    asomers
    MFC after:      3 days
    Approved by:    re (cperciva)
    Differential Revision:  https://reviews.freebsd.org/D45225

    (cherry picked from commit 4b3141f5d5373989598f9447ab5a9f87e2d1c9fb)
    (cherry picked from commit ea5a708625e5b5c89333e3e3e9a48fe588f05361)

 sys/geom/eli/g_eli.c                    |  4 +++-
 tests/sys/geom/class/eli/attach_test.sh | 29 +++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 1 deletion(-)