When running kyua tests on FBSD 15.0-CURRENT with qat accelerator hardware (registered with OCF to run crypto pay loads), we see kernel panic in g_eli_crypto_rerun() which in turn calls crypto_dispatch_one(). Panic log "panic: incoming crp already done" g_eli_crypto_rerun() gets called when the qat acceleration software returns EAGAIN(file: sys/dev/qat/qat/qat_ocf.c, function: qat_ocf_process()). Since the request submission to qat driver has failed, crypto_done(crp) is called and crp->crp_etype is set to EAGAIN. On seeing EGAIN, g_eli_crypto_read_done()/ g_eli_crypto_write_done() calls g_eli_crypto_rerun(). In g_eli_crypto_rerun(), since CRYPTO_F_DONE bit is not cleared from crp->crp_flags before calling crypto_dispatch_one(), we see kernel panic in KASSERT(!(crp->crp_flags & CRYPTO_F_DONE), ("incoming crp already done")); Command executed: cd /usr/tests/ && kyua -v test_suites.FreeBSD.disks='/dev/md0 /dev/md1' test sys/geom/class/eli Please Note: This issue is not seen on STABLE branches and occurs only on FBSD 15.0-CURRENT wherein "include "std.debug" is getting added to GENERIC conf file by default. This std.debug enables INVARIANTS flag that has the above mentioned KASSERT check. To fix this kernel panic, we feel that CRYPTO_F_DONE bit should be cleared from crp->crp_etype in g_eli_crypto_rerun() before invoking crypto_dispatch(). Please suggest.
I believe this patch will fix it: https://reviews.freebsd.org/D50104
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=2fa185f9bf5948ead9c3920d452ddd6bcad8f569 commit 2fa185f9bf5948ead9c3920d452ddd6bcad8f569 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2025-05-09 00:23:40 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2025-05-09 00:29:23 +0000 crypto: Remove uses of CRYPTO_F_DONE Previously OCF set CRYPTO_F_DONE prior to invoking the completion callback, even if the request failed. This isn't particularly useful and leads to bugs when consumers retry a failed request, since OCF also asserts that CRYPTO_F_DONE is clear in crypto_dispatch(). (Really, OCF should retry requests that fail with EAGAIN, but that's a larger change.) For now, just stop setting CRYPTO_F_DONE to simplify consumers (and fix those which fail to clear the flag before retrying a request). PR: 286321 Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D50104 share/man/man9/crypto_request.9 | 13 +------------ sys/contrib/openzfs/module/os/freebsd/zfs/crypto_os.c | 1 - sys/kgssapi/krb5/kcrypto_aes.c | 1 - sys/opencrypto/crypto.c | 6 ------ sys/opencrypto/cryptodev.c | 2 -- sys/opencrypto/cryptodev.h | 2 +- sys/opencrypto/ktls_ocf.c | 2 -- sys/sys/param.h | 2 +- 8 files changed, 3 insertions(+), 26 deletions(-)
Vishnu, can you confirm that the problem is gone after this commit?
(In reply to Mark Johnston from comment #3) Thank you Mark. The issue is fixed with the mentioned commit.