Bug 227788 - Crypto READ request failed (error=22). md10.eli[READ(offset=818688, length=512
Summary: Crypto READ request failed (error=22). md10.eli[READ(offset=818688, length=512
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Conrad Meyer
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-26 09:29 UTC by Eitan Adler
Modified: 2018-06-23 17:51 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eitan Adler freebsd_committer freebsd_triage 2018-04-26 09:29:57 UTC
Unread portion of the kernel message buffer:
[173352] GEOM_ELI: Device md10.eli created.
[173352] GEOM_ELI: Encryption: AES-XTS 128
[173352] GEOM_ELI:  Integrity: HMAC/SHA1
[173352] GEOM_ELI:     Crypto: hardware
[173352] GEOM_ELI: Crypto READ request failed (error=22). md10.eli[READ(offset=818688, length=512
)]
[173352] panic: crypto_dispatch() failed (error=22)
[173352] cpuid = 20
[173352] time = 1523431733
[173352] KDB: stack backtrace:
[173352] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00ae86d800
[173352] vpanic() at vpanic+0x18d/frame 0xfffffe00ae86d860
[173352] doadump() at doadump/frame 0xfffffe00ae86d8e0
[173352] g_eli_auth_run() at g_eli_auth_run+0x22b/frame 0xfffffe00ae86da00
[173352] g_eli_worker() at g_eli_worker+0x14c/frame 0xfffffe00ae86da70
[173352] fork_exit() at fork_exit+0x84/frame 0xfffffe00ae86dab0
[173352] fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00ae86dab0
[173352] --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
[173352] KDB: enter: panic


#0  __curthread () at ./machine/pcpu.h:230
#1  doadump (textdump=0x1) at /usr/src/sys/kern/kern_shutdown.c:361
#2  0xffffffff80434f4c in db_fncall_generic (addr=<optimized out>, rv=<optimized out>, nargs=<optimized out>, args=<optimized out>) at /usr/src/sys/ddb/db_command.c:609
#3  db_fncall (dummy1=<optimized out>, dummy2=<optimized out>, dummy3=<optimized out>, dummy4=<optimized out>) at /usr/src/sys/ddb/db_command.c:657
#4  0xffffffff80434a99 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=<optimized out>) at /usr/src/sys/ddb/db_command.c:481
#5  0xffffffff80434814 in db_command_loop () at /usr/src/sys/ddb/db_command.c:534
#6  0xffffffff80437a3f in db_trap (type=<optimized out>, code=<optimized out>) at /usr/src/sys/ddb/db_main.c:250
#7  0xffffffff80babf53 in kdb_trap (type=0x3, code=0xffff0ff0, tf=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:697
#8  0xffffffff81024aa8 in trap (frame=0xfffffe00ae86d730) at /usr/src/sys/amd64/amd64/trap.c:548
#9  <signal handler called>
#10 kdb_enter (why=0xffffffff8129f663 "panic", msg=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:479
#11 0xffffffff80b66b5a in vpanic (fmt=<optimized out>, ap=0xfffffe00ae86d8a0) at /usr/src/sys/kern/kern_shutdown.c:826
#12 0xffffffff80b66920 in kassert_panic (fmt=0xffffffff825f243b "crypto_dispatch() failed (error=%d)") at /usr/src/sys/kern/kern_shutdown.c:723
#13 0xffffffff825ee10b in g_eli_auth_run (wr=0xfffff8003a580bc0, bp=<optimized out>) at /usr/src/sys/geom/eli/g_eli_integrity.c:537
#14 0xffffffff825e9b7c in g_eli_worker (arg=<optimized out>) at /usr/src/sys/geom/eli/g_eli.c:542
#15 0xffffffff80b26e34 in fork_exit (callout=0xffffffff825e9a30 <g_eli_worker>, arg=0xfffff8003a580bc0, frame=0xfffffe00ae86dac0) at /usr/src/sys/kern/kern_fork.c:1039
#16 <signal handler called>

(kgdb) info registers
rax            0x12     0x12
rbx            0xfffffe00ae86d968       0xfffffe00ae86d968
rcx            0x80     0x80
rdx            0xfffffe00ae86d6f0       0xfffffe00ae86d6f0
rsi            0x80     0x80
rdi            0xffffffff81deab08       0xffffffff81deab08
rbp            0xfffffe00ae86da00       0xfffffe00ae86da00
rsp            0xfffffe00ae86d8f0       0xfffffe00ae86d8f0
r8             0x1      0x1
r9             0x0      0x0
r10            0xffffffff81cdc698       0xffffffff81cdc698
r11            0x0      0x0
r12            0x1e0    0x1e0
r13            0xfffff8004362f428       0xfffff8004362f428
r14            0xfffff8004362f598       0xfffff8004362f598
r15            0x18f800 0x18f800
rip            0xffffffff825ee10b       0xffffffff825ee10b <g_eli_auth_run+555>
eflags         0x82     [ SF ]
cs             0x20     0x20
ss             0x28     0x28
ds             <unavailable>
es             <unavailable>
fs             <unavailable>
gs             <unavailable>
fs_base        <unavailable>
gs_base        <unavailable>

(kgdb) info locals
sc = 0xfffff80136461c00
encr_secsize = 0x200
nsec = <optimized out>
lsec = 0x2
plaindata = 0xfffff800051b1a00 "\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336"...
data = <optimized out>
auth = 0xfffff8004362f414 "\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\067"
p = <optimized out>
i = <error reading variable i (Cannot access memory at address 0x1)>
decr_secsize = 0x200
dstoff = 0x18f800
error = 0x80
crp = 0xfffff8004362f428
data_secsize = <optimized out>
crde = <optimized out>
crda = <optimized out>
authkey = <optimized out>

(kgdb) info args
wr = 0xfffff8003a580bc0
bp = <optimized out>

(kgdb) p *crp
$1 = {
  crp_next = {
    tqe_next = 0xdeadc0dedeadc0de,
    tqe_prev = 0xdeadc0dedeadc0de
  },
  crp_task = {
    ta_link = {
      stqe_next = 0xdeadc0dedeadc0de
    },
    ta_pending = 0xc0de,
    ta_priority = 0xdead,
    ta_func = 0xdeadc0dedeadc0de,
    ta_context = 0xdeadc0dedeadc0de
  },
  crp_sid = 0x500000100000037,
  crp_ilen = 0x1f4,
  crp_olen = 0x1e0,
  crp_etype = 0x16,
  crp_flags = 0x60,
  {
    crp_buf = 0xfffff8004362f000 "",
    crp_mbuf = 0xfffff8004362f000,
    crp_uio = 0xfffff8004362f000
  },
  crp_opaque = 0xfffff803726ffc00,
  crp_desc = 0xfffff8004362f520,
  crp_callback = 0xffffffff825ee8f0 <g_eli_auth_read_done>,
  crp_tstamp = {
    sec = 0xdeadc0dedeadc0de,
    frac = 0xdeadc0dedeadc0de
  },
  crp_seq = 0xdeadc0de,
  crp_retw_id = 0x17
}

(kgdb) p *crp->crp_desc
$3 = {
  crd_skip = 0x14,
  crd_len = 0x1e0,
  crd_inject = 0x0,
  crd_flags = 0x10,
  CRD_INI = {
    cri_alg = 0x7,
    cri_klen = 0x100,
    cri_mlen = 0xdeadc0de,
    cri_key = 0xfffff8004362f598 "=\031eR\340\275D\366\205\020\307,\304]\bP\207\202\207\061\347\070\344\264\325\267\336\371~x\366\267\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336"...,
    cri_iv =       "\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336",
    cri_next = 0xdeadc0dedeadc0de
  },
  crd_next = 0xfffff8004362f4a8
}

(kgdb) p *crp->crp_desc->crd_next
$4 = {
  crd_skip = 0x14,
  crd_len = 0x1e0,
  crd_inject = 0xdeadc0de,
  crd_flags = 0x16,
  CRD_INI = {
    cri_alg = 0x16,
    cri_klen = 0x100,
    cri_mlen = 0xdeadc0de,
    cri_key = 0xfffff8003a28ae00 "\214\352\255\370\"\214|\275\367=\005\331X\325\251\371\220LN\207\037sY\305\274\033\346\200\217\343\272J\032H\021u[\024:\303\301\275\210\200\321\"\251\071\251\022",
    cri_iv =       "",
    cri_next = 0xdeadc0dedeadc0de
  },
  crd_next = 0x0
}

----


-- 
Eitan Adler
Comment 1 Conrad Meyer freebsd_committer freebsd_triage 2018-04-26 16:12:03 UTC
You hit this with aesni(4) loaded, right?  I almost certainly broke this.
Comment 2 Eitan Adler freebsd_committer freebsd_triage 2018-04-26 16:17:20 UTC
yes. aesni was loaded.
Comment 3 Conrad Meyer freebsd_committer freebsd_triage 2018-04-26 17:12:03 UTC
(In reply to Eitan Adler from comment #2)
Thanks for confirming!
Comment 4 Conrad Meyer freebsd_committer freebsd_triage 2018-04-27 02:47:01 UTC
g_eli_integrity.c:g_eli_auth_run() creates an encrypt-then-MAC cryptop for writes and a MAC-then-decrypt cryptop for reads.

Why does aesni(4) reject the cryptop?  It might be informative if CRYPTDEB() was enabled.  Unfortunately, this requires a recompile... (change the #if 0 to #if 1 in sys/opencrypto/cryptodev.h:352).

One suspect is this condition:

  468         if (enccrd != NULL && (enccrd->crd_alg == CRYPTO_AES_CBC ||
  469             enccrd->crd_alg == CRYPTO_AES_XTS) &&
  470             (enccrd->crd_len % AES_BLOCK_LEN) != 0) {
  471                 error = EINVAL;
  472                 goto out;
  473         }

Can g_eli_integrity generate crde->crd_len's that are not block-multiple?

420         /* Sectorsize of decrypted provider eg. 4096. */
421         decr_secsize = bp->bio_to->sectorsize;
...
424         /* Number of data bytes in one encrypted sector, eg. 480. */
425         data_secsize = sc->sc_data_per_sector;
...
467                 data_secsize = sc->sc_data_per_sector;
468                 if ((i % lsec) == 0) {
469                         data_secsize = decr_secsize % data_secsize;
...
512                 crde->crd_len = data_secsize;

Both 480 and (4096 % 480) are even multiples of AES block size (16 bytes).

And in fact, sc_data_per_sector is supposed to be chosen so the size is a multiple of 16:

  636                 sc->sc_alen = g_eli_hashlen(sc->sc_aalgo);
  637
  638                 sc->sc_data_per_sector = sectorsize - sc->sc_alen;
  639                 /*
  640                  * Some hash functions (like SHA1 and RIPEMD160) generates hash
  641                  * which length is not multiple of 128 bits, but we want data
  642                  * length to be multiple of 128, so we can encrypt without
  643                  * padding. The line below rounds down data length to multiple
  644                  * of 128 bits.
  645                  */
  646                 sc->sc_data_per_sector -= sc->sc_data_per_sector % 16;

So, I'm not really sure why aesni is rejecting the cryptop.  It would be good to instrument it and find out.  It would also be good to fix g_eli_integrity not to panic (INVARIANTS) or !!! silently ignore crypto failures !!! (non-INVARIANTS).
Comment 5 Conrad Meyer freebsd_committer freebsd_triage 2018-04-27 02:50:19 UTC
(In reply to Eitan Adler from comment #0)
  crp_ilen = 0x1f4,  // 500 bytes
  crp_olen = 0x1e0,  // 480 bytes
...
(kgdb) p *crp->crp_desc
$3 = {
  crd_skip = 0x14,
  crd_len = 0x1e0,
  crd_inject = 0x0,
  crd_flags = 0x10,
  CRD_INI = {
    cri_alg = 0x7,
    cri_klen = 0x100,
...

(kgdb) p *crp->crp_desc->crd_next
$4 = {
  crd_skip = 0x14,
  crd_len = 0x1e0,
  crd_inject = 0xdeadc0de,
  crd_flags = 0x16,
  CRD_INI = {
    cri_alg = 0x16,
    cri_klen = 0x100,

So these crd_lens all look reasonable, but the crd_skip is 20 bytes (unaligend).  I wonder if that's related.
Comment 6 Conrad Meyer freebsd_committer freebsd_triage 2018-04-27 02:54:31 UTC
aesni_process -> aesni_cipher_process -> aesni_cipher_mac, aesni_cipher_crypt.

aesni_cipher_mac has this check:

  904         if (crd->crd_flags != 0)
  905                 return (EINVAL);

And indeed our MAC cryptodesc has a flag set:

(kgdb) p *crp->crp_desc
$3 = {
  crd_flags = 0x10,  << CRD_F_KEY_EXPLICIT

So that probably explains it.
Comment 7 Conrad Meyer freebsd_committer freebsd_triage 2018-04-28 17:41:23 UTC
Please try this patch: https://people.freebsd.org/~cem/aesni_pr_227788.patch
Comment 8 Eitan Adler freebsd_committer freebsd_triage 2018-04-28 19:05:03 UTC
after rebuilding with this patch and installing the world+kernel and rebooting I was unable to even 'geli attach' anymore. this was likely not your fault, but it made testing this impossible.

I am currently reinstalling and will try again later.
Comment 9 Conrad Meyer freebsd_committer freebsd_triage 2018-04-28 19:28:05 UTC
(In reply to Eitan Adler from comment #8)
> I was unable to even 'geli attach' anymore.

Can you go into a little more detail on what "unable to" means?  Did you encounter an error of some kind, and if so, can you quote it?  Thanks.
Comment 10 Eitan Adler freebsd_committer freebsd_triage 2018-04-28 19:34:15 UTC
- the disk initially refused to boot at all, as in BIOS didn't recognize it as a boot disk
- when trying to recover from that using a 12-current USB live disk, 'geli attach /dev/ada0p4' rejected the password

(some of this may have been changed I made prior to applying this patch that I forgot about, hence not believing it is your fault)
Comment 11 Conrad Meyer freebsd_committer freebsd_triage 2018-05-04 20:44:57 UTC
Hi,

Are you able to give it another go with a clean src tree?
Comment 12 Eitan Adler freebsd_committer freebsd_triage 2018-05-05 00:54:27 UTC
crashes again, but now in a different place.

∴sudo kyua test sys/geom/class/eli
sys/geom/class/eli/attach_test:attach_d  ->  passed  [3.534s]
sys/geom/class/eli/attach_test:attach_r  ->  passed  [0.324s]
sys/geom/class/eli/attach_test:nokey  ->  passed  [0.085s]
sys/geom/class/eli/configure_test:configure_b_B  ->  passed  [1.310s]
sys/geom/class/eli/delkey_test:delkey  ->  passed  [5.683s]
sys/geom/class/eli/delkey_test:delkey_readonly  ->  passed  [0.173s]
sys/geom/class/eli/detach_test:detach_l  ->  passed  [2.254s]
sys/geom/class/eli/init_test:init  ->  failed: Miscompare for ealgo=aes-xts keylen=128 sec=512  [1.349s]
sys/geom/class/eli/init_test:init_B  ->  passed  [3.353s]
sys/geom/class/eli/init_test:init_J  -> passed [97.073s]
sys/geom/class/eli/init_test:init_a  ->  failed: atf-check failed; see the output of the test for details  [29.164s]
sys/geom/class/eli/init_test:init_alias  ->  passed  [42.930s]
sys/geom/class/eli/init_test:init_i_P  ->  passed  [0.120s]
sys/geom/class/eli/init_test:nokey  ->  passed  [0.069s]
sys/geom/class/eli/integrity_test:copy  ->  failed: atf-check failed; see the output of the test for details  [21.901s]
...

so now it seems it is not crashing, but also not passing.
Comment 13 Eitan Adler freebsd_committer freebsd_triage 2018-05-05 00:55:18 UTC
> crashes again, but now in a different place.

ignore this line. I meant that for a different ticket
Comment 14 Conrad Meyer freebsd_committer freebsd_triage 2018-05-05 01:47:26 UTC
Ok, thanks, I'll have to spend a little more time looking into this.  I appreciate the report and testing.
Comment 15 commit-hook freebsd_committer freebsd_triage 2018-06-23 17:25:15 UTC
A commit references this bug:

Author: cem
Date: Sat Jun 23 17:24:19 UTC 2018
New revision: 335583
URL: https://svnweb.freebsd.org/changeset/base/335583

Log:
  aesni(4): Support CRD_F_KEY_EXPLICIT OCF mode

  PR:		227788
  Reported by:	eadler@

Changes:
  head/sys/crypto/aesni/aesni.c
Comment 16 Conrad Meyer freebsd_committer freebsd_triage 2018-06-23 17:40:04 UTC
With the integrity_test:copy testcase, I (now) get tons of messages like "GEOM_ELI: md1.eli: Failed to authenticate 2048 bytes of data at offset 22528"; it seems this is desirable.  The test case intentionally trashes integrity-protected virtual disks to verify integrity protection.

However, it succeeds.  Will explore further.
Comment 17 Conrad Meyer freebsd_committer freebsd_triage 2018-06-23 17:43:49 UTC
Doh.  I had not loaded aesni... sigh.
Comment 18 Conrad Meyer freebsd_committer freebsd_triage 2018-06-23 17:51:55 UTC
Ok, with aesni loaded it still seems to pass.  Some runs print "Crypto: hardware" to the logs so it seems aesni is getting used.

The other integrity test cases pass as well.  Is it possible in comment #12 you were testing with an old version of the geli(8) binary which did not set keyno?  You would have observed something like "Missing keyno" in stderr if you checked that.  I'm going to go ahead and mark this as resolved for now.