Unread portion of the kernel message buffer: [173352] GEOM_ELI: Device md10.eli created. [173352] GEOM_ELI: Encryption: AES-XTS 128 [173352] GEOM_ELI: Integrity: HMAC/SHA1 [173352] GEOM_ELI: Crypto: hardware [173352] GEOM_ELI: Crypto READ request failed (error=22). md10.eli[READ(offset=818688, length=512 )] [173352] panic: crypto_dispatch() failed (error=22) [173352] cpuid = 20 [173352] time = 1523431733 [173352] KDB: stack backtrace: [173352] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00ae86d800 [173352] vpanic() at vpanic+0x18d/frame 0xfffffe00ae86d860 [173352] doadump() at doadump/frame 0xfffffe00ae86d8e0 [173352] g_eli_auth_run() at g_eli_auth_run+0x22b/frame 0xfffffe00ae86da00 [173352] g_eli_worker() at g_eli_worker+0x14c/frame 0xfffffe00ae86da70 [173352] fork_exit() at fork_exit+0x84/frame 0xfffffe00ae86dab0 [173352] fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00ae86dab0 [173352] --- trap 0, rip = 0, rsp = 0, rbp = 0 --- [173352] KDB: enter: panic #0 __curthread () at ./machine/pcpu.h:230 #1 doadump (textdump=0x1) at /usr/src/sys/kern/kern_shutdown.c:361 #2 0xffffffff80434f4c in db_fncall_generic (addr=<optimized out>, rv=<optimized out>, nargs=<optimized out>, args=<optimized out>) at /usr/src/sys/ddb/db_command.c:609 #3 db_fncall (dummy1=<optimized out>, dummy2=<optimized out>, dummy3=<optimized out>, dummy4=<optimized out>) at /usr/src/sys/ddb/db_command.c:657 #4 0xffffffff80434a99 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=<optimized out>) at /usr/src/sys/ddb/db_command.c:481 #5 0xffffffff80434814 in db_command_loop () at /usr/src/sys/ddb/db_command.c:534 #6 0xffffffff80437a3f in db_trap (type=<optimized out>, code=<optimized out>) at /usr/src/sys/ddb/db_main.c:250 #7 0xffffffff80babf53 in kdb_trap (type=0x3, code=0xffff0ff0, tf=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:697 #8 0xffffffff81024aa8 in trap (frame=0xfffffe00ae86d730) at /usr/src/sys/amd64/amd64/trap.c:548 #9 <signal handler called> #10 kdb_enter (why=0xffffffff8129f663 "panic", msg=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:479 #11 0xffffffff80b66b5a in vpanic (fmt=<optimized out>, ap=0xfffffe00ae86d8a0) at /usr/src/sys/kern/kern_shutdown.c:826 #12 0xffffffff80b66920 in kassert_panic (fmt=0xffffffff825f243b "crypto_dispatch() failed (error=%d)") at /usr/src/sys/kern/kern_shutdown.c:723 #13 0xffffffff825ee10b in g_eli_auth_run (wr=0xfffff8003a580bc0, bp=<optimized out>) at /usr/src/sys/geom/eli/g_eli_integrity.c:537 #14 0xffffffff825e9b7c in g_eli_worker (arg=<optimized out>) at /usr/src/sys/geom/eli/g_eli.c:542 #15 0xffffffff80b26e34 in fork_exit (callout=0xffffffff825e9a30 <g_eli_worker>, arg=0xfffff8003a580bc0, frame=0xfffffe00ae86dac0) at /usr/src/sys/kern/kern_fork.c:1039 #16 <signal handler called> (kgdb) info registers rax 0x12 0x12 rbx 0xfffffe00ae86d968 0xfffffe00ae86d968 rcx 0x80 0x80 rdx 0xfffffe00ae86d6f0 0xfffffe00ae86d6f0 rsi 0x80 0x80 rdi 0xffffffff81deab08 0xffffffff81deab08 rbp 0xfffffe00ae86da00 0xfffffe00ae86da00 rsp 0xfffffe00ae86d8f0 0xfffffe00ae86d8f0 r8 0x1 0x1 r9 0x0 0x0 r10 0xffffffff81cdc698 0xffffffff81cdc698 r11 0x0 0x0 r12 0x1e0 0x1e0 r13 0xfffff8004362f428 0xfffff8004362f428 r14 0xfffff8004362f598 0xfffff8004362f598 r15 0x18f800 0x18f800 rip 0xffffffff825ee10b 0xffffffff825ee10b <g_eli_auth_run+555> eflags 0x82 [ SF ] cs 0x20 0x20 ss 0x28 0x28 ds <unavailable> es <unavailable> fs <unavailable> gs <unavailable> fs_base <unavailable> gs_base <unavailable> (kgdb) info locals sc = 0xfffff80136461c00 encr_secsize = 0x200 nsec = <optimized out> lsec = 0x2 plaindata = 0xfffff800051b1a00 "\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336"... data = <optimized out> auth = 0xfffff8004362f414 "\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\067" p = <optimized out> i = <error reading variable i (Cannot access memory at address 0x1)> decr_secsize = 0x200 dstoff = 0x18f800 error = 0x80 crp = 0xfffff8004362f428 data_secsize = <optimized out> crde = <optimized out> crda = <optimized out> authkey = <optimized out> (kgdb) info args wr = 0xfffff8003a580bc0 bp = <optimized out> (kgdb) p *crp $1 = { crp_next = { tqe_next = 0xdeadc0dedeadc0de, tqe_prev = 0xdeadc0dedeadc0de }, crp_task = { ta_link = { stqe_next = 0xdeadc0dedeadc0de }, ta_pending = 0xc0de, ta_priority = 0xdead, ta_func = 0xdeadc0dedeadc0de, ta_context = 0xdeadc0dedeadc0de }, crp_sid = 0x500000100000037, crp_ilen = 0x1f4, crp_olen = 0x1e0, crp_etype = 0x16, crp_flags = 0x60, { crp_buf = 0xfffff8004362f000 "", crp_mbuf = 0xfffff8004362f000, crp_uio = 0xfffff8004362f000 }, crp_opaque = 0xfffff803726ffc00, crp_desc = 0xfffff8004362f520, crp_callback = 0xffffffff825ee8f0 <g_eli_auth_read_done>, crp_tstamp = { sec = 0xdeadc0dedeadc0de, frac = 0xdeadc0dedeadc0de }, crp_seq = 0xdeadc0de, crp_retw_id = 0x17 } (kgdb) p *crp->crp_desc $3 = { crd_skip = 0x14, crd_len = 0x1e0, crd_inject = 0x0, crd_flags = 0x10, CRD_INI = { cri_alg = 0x7, cri_klen = 0x100, cri_mlen = 0xdeadc0de, cri_key = 0xfffff8004362f598 "=\031eR\340\275D\366\205\020\307,\304]\bP\207\202\207\061\347\070\344\264\325\267\336\371~x\366\267\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336"..., cri_iv = "\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336", cri_next = 0xdeadc0dedeadc0de }, crd_next = 0xfffff8004362f4a8 } (kgdb) p *crp->crp_desc->crd_next $4 = { crd_skip = 0x14, crd_len = 0x1e0, crd_inject = 0xdeadc0de, crd_flags = 0x16, CRD_INI = { cri_alg = 0x16, cri_klen = 0x100, cri_mlen = 0xdeadc0de, cri_key = 0xfffff8003a28ae00 "\214\352\255\370\"\214|\275\367=\005\331X\325\251\371\220LN\207\037sY\305\274\033\346\200\217\343\272J\032H\021u[\024:\303\301\275\210\200\321\"\251\071\251\022", cri_iv = "", cri_next = 0xdeadc0dedeadc0de }, crd_next = 0x0 } ---- -- Eitan Adler
You hit this with aesni(4) loaded, right? I almost certainly broke this.
yes. aesni was loaded.
(In reply to Eitan Adler from comment #2) Thanks for confirming!
g_eli_integrity.c:g_eli_auth_run() creates an encrypt-then-MAC cryptop for writes and a MAC-then-decrypt cryptop for reads. Why does aesni(4) reject the cryptop? It might be informative if CRYPTDEB() was enabled. Unfortunately, this requires a recompile... (change the #if 0 to #if 1 in sys/opencrypto/cryptodev.h:352). One suspect is this condition: 468 if (enccrd != NULL && (enccrd->crd_alg == CRYPTO_AES_CBC || 469 enccrd->crd_alg == CRYPTO_AES_XTS) && 470 (enccrd->crd_len % AES_BLOCK_LEN) != 0) { 471 error = EINVAL; 472 goto out; 473 } Can g_eli_integrity generate crde->crd_len's that are not block-multiple? 420 /* Sectorsize of decrypted provider eg. 4096. */ 421 decr_secsize = bp->bio_to->sectorsize; ... 424 /* Number of data bytes in one encrypted sector, eg. 480. */ 425 data_secsize = sc->sc_data_per_sector; ... 467 data_secsize = sc->sc_data_per_sector; 468 if ((i % lsec) == 0) { 469 data_secsize = decr_secsize % data_secsize; ... 512 crde->crd_len = data_secsize; Both 480 and (4096 % 480) are even multiples of AES block size (16 bytes). And in fact, sc_data_per_sector is supposed to be chosen so the size is a multiple of 16: 636 sc->sc_alen = g_eli_hashlen(sc->sc_aalgo); 637 638 sc->sc_data_per_sector = sectorsize - sc->sc_alen; 639 /* 640 * Some hash functions (like SHA1 and RIPEMD160) generates hash 641 * which length is not multiple of 128 bits, but we want data 642 * length to be multiple of 128, so we can encrypt without 643 * padding. The line below rounds down data length to multiple 644 * of 128 bits. 645 */ 646 sc->sc_data_per_sector -= sc->sc_data_per_sector % 16; So, I'm not really sure why aesni is rejecting the cryptop. It would be good to instrument it and find out. It would also be good to fix g_eli_integrity not to panic (INVARIANTS) or !!! silently ignore crypto failures !!! (non-INVARIANTS).
(In reply to Eitan Adler from comment #0) crp_ilen = 0x1f4, // 500 bytes crp_olen = 0x1e0, // 480 bytes ... (kgdb) p *crp->crp_desc $3 = { crd_skip = 0x14, crd_len = 0x1e0, crd_inject = 0x0, crd_flags = 0x10, CRD_INI = { cri_alg = 0x7, cri_klen = 0x100, ... (kgdb) p *crp->crp_desc->crd_next $4 = { crd_skip = 0x14, crd_len = 0x1e0, crd_inject = 0xdeadc0de, crd_flags = 0x16, CRD_INI = { cri_alg = 0x16, cri_klen = 0x100, So these crd_lens all look reasonable, but the crd_skip is 20 bytes (unaligend). I wonder if that's related.
aesni_process -> aesni_cipher_process -> aesni_cipher_mac, aesni_cipher_crypt. aesni_cipher_mac has this check: 904 if (crd->crd_flags != 0) 905 return (EINVAL); And indeed our MAC cryptodesc has a flag set: (kgdb) p *crp->crp_desc $3 = { crd_flags = 0x10, << CRD_F_KEY_EXPLICIT So that probably explains it.
Please try this patch: https://people.freebsd.org/~cem/aesni_pr_227788.patch
after rebuilding with this patch and installing the world+kernel and rebooting I was unable to even 'geli attach' anymore. this was likely not your fault, but it made testing this impossible. I am currently reinstalling and will try again later.
(In reply to Eitan Adler from comment #8) > I was unable to even 'geli attach' anymore. Can you go into a little more detail on what "unable to" means? Did you encounter an error of some kind, and if so, can you quote it? Thanks.
- the disk initially refused to boot at all, as in BIOS didn't recognize it as a boot disk - when trying to recover from that using a 12-current USB live disk, 'geli attach /dev/ada0p4' rejected the password (some of this may have been changed I made prior to applying this patch that I forgot about, hence not believing it is your fault)
Hi, Are you able to give it another go with a clean src tree?
crashes again, but now in a different place. ∴sudo kyua test sys/geom/class/eli sys/geom/class/eli/attach_test:attach_d -> passed [3.534s] sys/geom/class/eli/attach_test:attach_r -> passed [0.324s] sys/geom/class/eli/attach_test:nokey -> passed [0.085s] sys/geom/class/eli/configure_test:configure_b_B -> passed [1.310s] sys/geom/class/eli/delkey_test:delkey -> passed [5.683s] sys/geom/class/eli/delkey_test:delkey_readonly -> passed [0.173s] sys/geom/class/eli/detach_test:detach_l -> passed [2.254s] sys/geom/class/eli/init_test:init -> failed: Miscompare for ealgo=aes-xts keylen=128 sec=512 [1.349s] sys/geom/class/eli/init_test:init_B -> passed [3.353s] sys/geom/class/eli/init_test:init_J -> passed [97.073s] sys/geom/class/eli/init_test:init_a -> failed: atf-check failed; see the output of the test for details [29.164s] sys/geom/class/eli/init_test:init_alias -> passed [42.930s] sys/geom/class/eli/init_test:init_i_P -> passed [0.120s] sys/geom/class/eli/init_test:nokey -> passed [0.069s] sys/geom/class/eli/integrity_test:copy -> failed: atf-check failed; see the output of the test for details [21.901s] ... so now it seems it is not crashing, but also not passing.
> crashes again, but now in a different place. ignore this line. I meant that for a different ticket
Ok, thanks, I'll have to spend a little more time looking into this. I appreciate the report and testing.
A commit references this bug: Author: cem Date: Sat Jun 23 17:24:19 UTC 2018 New revision: 335583 URL: https://svnweb.freebsd.org/changeset/base/335583 Log: aesni(4): Support CRD_F_KEY_EXPLICIT OCF mode PR: 227788 Reported by: eadler@ Changes: head/sys/crypto/aesni/aesni.c
With the integrity_test:copy testcase, I (now) get tons of messages like "GEOM_ELI: md1.eli: Failed to authenticate 2048 bytes of data at offset 22528"; it seems this is desirable. The test case intentionally trashes integrity-protected virtual disks to verify integrity protection. However, it succeeds. Will explore further.
Doh. I had not loaded aesni... sigh.
Ok, with aesni loaded it still seems to pass. Some runs print "Crypto: hardware" to the logs so it seems aesni is getting used. The other integrity test cases pass as well. Is it possible in comment #12 you were testing with an old version of the geli(8) binary which did not set keyno? You would have observed something like "Missing keyno" in stderr if you checked that. I'm going to go ahead and mark this as resolved for now.