Bug 286321 - [FBSD 15.0 Current] kernel panic "panic: incoming crp already done" while running kyua tests with qat driver
Summary: [FBSD 15.0 Current] kernel panic "panic: incoming crp already done" while run...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 15.0-CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: crash, vendor
Depends on:
Blocks:
 
Reported: 2025-04-24 13:20 UTC by Vishnu Das R
Modified: 2025-05-14 01:39 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vishnu Das R 2025-04-24 13:20:08 UTC
When running kyua tests on FBSD 15.0-CURRENT with qat accelerator hardware (registered with OCF to run crypto pay loads), we see kernel panic in g_eli_crypto_rerun() which in turn calls crypto_dispatch_one().
Panic log "panic: incoming crp already done"
 
g_eli_crypto_rerun() gets called when the qat acceleration software returns EAGAIN(file: sys/dev/qat/qat/qat_ocf.c, function: qat_ocf_process()). Since the request submission to qat driver has failed,  crypto_done(crp) is called and crp->crp_etype is set to EAGAIN.
 
On seeing EGAIN, g_eli_crypto_read_done()/ g_eli_crypto_write_done() calls g_eli_crypto_rerun().
In g_eli_crypto_rerun(), since CRYPTO_F_DONE bit is not cleared from crp->crp_flags before calling crypto_dispatch_one(), we see kernel panic in
 
    KASSERT(!(crp->crp_flags & CRYPTO_F_DONE),
        ("incoming crp already done"));
 
Command executed:
cd /usr/tests/ && kyua -v test_suites.FreeBSD.disks='/dev/md0 /dev/md1' test sys/geom/class/eli
 
 
Please Note: This issue is not seen on STABLE branches and occurs only on FBSD 15.0-CURRENT wherein "include "std.debug" is getting added to GENERIC conf file by default. This std.debug enables INVARIANTS flag that has the above mentioned KASSERT check.
 
To fix this kernel panic, we feel that CRYPTO_F_DONE bit should be cleared from crp->crp_etype in g_eli_crypto_rerun() before invoking crypto_dispatch().
Please suggest.
Comment 1 Mark Johnston freebsd_committer freebsd_triage 2025-05-01 14:27:19 UTC
I believe this patch will fix it: https://reviews.freebsd.org/D50104
Comment 2 commit-hook freebsd_committer freebsd_triage 2025-05-09 00:35:06 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=2fa185f9bf5948ead9c3920d452ddd6bcad8f569

commit 2fa185f9bf5948ead9c3920d452ddd6bcad8f569
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2025-05-09 00:23:40 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2025-05-09 00:29:23 +0000

    crypto: Remove uses of CRYPTO_F_DONE

    Previously OCF set CRYPTO_F_DONE prior to invoking the completion
    callback, even if the request failed. This isn't particularly useful
    and leads to bugs when consumers retry a failed request, since OCF also
    asserts that CRYPTO_F_DONE is clear in crypto_dispatch(). (Really, OCF
    should retry requests that fail with EAGAIN, but that's a larger
    change.)

    For now, just stop setting CRYPTO_F_DONE to simplify consumers (and fix
    those which fail to clear the flag before retrying a request).

    PR:             286321
    Reviewed by:    jhb
    Differential Revision:  https://reviews.freebsd.org/D50104

 share/man/man9/crypto_request.9                       | 13 +------------
 sys/contrib/openzfs/module/os/freebsd/zfs/crypto_os.c |  1 -
 sys/kgssapi/krb5/kcrypto_aes.c                        |  1 -
 sys/opencrypto/crypto.c                               |  6 ------
 sys/opencrypto/cryptodev.c                            |  2 --
 sys/opencrypto/cryptodev.h                            |  2 +-
 sys/opencrypto/ktls_ocf.c                             |  2 --
 sys/sys/param.h                                       |  2 +-
 8 files changed, 3 insertions(+), 26 deletions(-)
Comment 3 Mark Johnston freebsd_committer freebsd_triage 2025-05-09 12:35:35 UTC
Vishnu, can you confirm that the problem is gone after this commit?
Comment 4 Vishnu Das R 2025-05-13 05:13:17 UTC
(In reply to Mark Johnston from comment #3)
Thank you Mark. The issue is fixed with the mentioned commit.