Bug 227982 - panic in ccp_process via geom_eli
Summary: panic in ccp_process via geom_eli
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-geom (Nobody)
Keywords: panic
Depends on:
Reported: 2018-05-05 01:09 UTC by Eitan Adler
Modified: 2020-10-28 03:28 UTC (History)
1 user (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Eitan Adler freebsd_committer freebsd_triage 2018-05-05 01:09:46 UTC
local crash id: integrity_test:data

how to repro:

kyua test sys/geom/class/eli/integrity_test:data

Unread portion of the kernel message buffer:
[511] GEOM_ELI: md15.eli: Failed to authenticate 512 bytes of data at offset 15360.
[511] GEOM_ELI: md15.eli: Failed to authenticate 512 bytes of data at offset 15872.
[511] GEOM_ELI: md15.eli: Failed to authenticate 512 bytes of data at offset 15872.
[511] panic: vm_fault_hold: fault on nofault entry, addr: 0xfffffe00ed517000
[511] cpuid = 16
[511] time = 1525477726
[511] KDB: stack backtrace:
[511] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00ef939c90
[511] vpanic() at vpanic+0x1a3/frame 0xfffffe00ef939cf0
[511] panic() at panic+0x43/frame 0xfffffe00ef939d50
[511] vm_fault_hold() at vm_fault_hold+0x237d/frame 0xfffffe00ef939e80
[511] vm_fault() at vm_fault+0x75/frame 0xfffffe00ef939ec0
[511] trap_pfault() at trap_pfault+0x171/frame 0xfffffe00ef939f10
[511] trap() at trap+0x2ff/frame 0xfffffe00ef93a020
[511] calltrap() at calltrap+0x8/frame 0xfffffe00ef93a020
[511] --- trap 0xc, rip = 0xffffffff8451b169, rsp = 0xfffffe00ef93a0f0, rbp = 0xfffffe00ef93a120 ---
[511] ccp_collect_iv() at ccp_collect_iv+0x169/frame 0xfffffe00ef93a120
[511] ccp_do_blkcipher() at ccp_do_blkcipher+0xf3/frame 0xfffffe00ef93a1e0
[511] ccp_authenc() at ccp_authenc+0x63/frame 0xfffffe00ef93a230
[511] ccp_process() at ccp_process+0xa1c/frame 0xfffffe00ef93a8b0
[511] crypto_dispatch() at crypto_dispatch+0x1d0/frame 0xfffffe00ef93a8e0
[511] g_eli_auth_run() at g_eli_auth_run+0x531/frame 0xfffffe00ef93aa00
[511] g_eli_worker() at g_eli_worker+0x14c/frame 0xfffffe00ef93aa70
[511] fork_exit() at fork_exit+0x84/frame 0xfffffe00ef93aab0
[511] fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00ef93aab0
[511] --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
[511] KDB: enter: panic

(kgdb) bt
#0  __curthread () at ./machine/pcpu.h:231
#1  doadump (textdump=0x1) at /usr/src/sys/kern/kern_shutdown.c:365
#2  0xffffffff8043821c in db_fncall_generic (addr=<optimized out>, rv=<optimized out>, nargs=<optimized out>, args=<optimized out>) at /usr/src/sys/ddb/db_command.c:609
#3  db_fncall (dummy1=<optimized out>, dummy2=<optimized out>, dummy3=<optimized out>, dummy4=<optimized out>) at /usr/src/sys/ddb/db_command.c:657
#4  0xffffffff80437d59 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=<optimized out>) at /usr/src/sys/ddb/db_command.c:481
#5  0xffffffff80437ad4 in db_command_loop () at /usr/src/sys/ddb/db_command.c:534
#6  0xffffffff8043ad0f in db_trap (type=<optimized out>, code=<optimized out>) at /usr/src/sys/ddb/db_main.c:250
#7  0xffffffff80bb3913 in kdb_trap (type=0x3, code=0xffff0ff0, tf=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:697
#8  0xffffffff81031c05 in trap (frame=0xfffffe00ef939bc0) at /usr/src/sys/amd64/amd64/trap.c:550
#9  <signal handler called>
#10 kdb_enter (why=0xffffffff812c2f0c "panic", msg=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:479
#11 0xffffffff80b6dee0 in vpanic (fmt=<optimized out>, ap=0xfffffe00ef939d30) at /usr/src/sys/kern/kern_shutdown.c:851
#12 0xffffffff80b6df73 in panic (fmt=0xffffffff81df03b8 <cnputs_mtx> "\213\214(\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:789
#13 0xffffffff80e9882d in vm_fault_hold (map=0xfffff80003006000, vaddr=<optimized out>, fault_type=<optimized out>, fault_flags=<optimized out>, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:563
#14 0xffffffff80e96465 in vm_fault (map=0xfffff80003006000, vaddr=<optimized out>, fault_type=0x1, fault_flags=0x0) at /usr/src/sys/vm/vm_fault.c:514
#15 0xffffffff81032501 in trap_pfault (frame=0xfffffe00ef93a030, usermode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:730
#16 0xffffffff81031bcf in trap (frame=0xfffffe00ef93a030) at /usr/src/sys/amd64/amd64/trap.c:413
#17 <signal handler called>
#18 ccp_byteswap (data=<optimized out>, len=<optimized out>) at /usr/src/sys/crypto/ccp/ccp_hardware.c:1320
#19 ccp_collect_iv (s=0xfffffe00ed5425c0, crp=0xfffff801e2f0cc40, crd=0xfffff801e2f0ccc0) at /usr/src/sys/crypto/ccp/ccp_hardware.c:1395
#20 0xffffffff84518e73 in ccp_do_blkcipher (qp=0xfffff800040e49c8, s=0xfffffe00ed5425c0, crp=<optimized out>, crd=0xfffff801e2f0ccc0, cctx=0xfffffe00ef93a1f0) at /usr/src/sys/crypto/ccp/ccp_hardware.c:1521
#21 0xffffffff8451a0e3 in ccp_authenc (qp=0xfffff800040e49c8, s=0xfffffe00ed5425c0, crp=0xfffff801e2f0cc40, crda=0xfffff801e2f0cd38, crde=0xfffff801e2f0ccc0) at /usr/src/sys/crypto/ccp/ccp_hardware.c:1738
#22 0xffffffff8451773c in ccp_process (dev=<optimized out>, crp=0xfffff801e2f0cc40, hint=<optimized out>) at /usr/src/sys/crypto/ccp/ccp.c:706
#23 0xffffffff80e046a0 in crypto_dispatch (crp=0xfffff801e2f0cc40) at /usr/src/sys/opencrypto/crypto.c:929
#24 0xffffffff845795e1 in g_eli_auth_run (wr=0xfffff80030a69f40, bp=<optimized out>) at /usr/src/sys/geom/eli/g_eli_integrity.c:536
#25 0xffffffff84574c8c in g_eli_worker (arg=<optimized out>) at /usr/src/sys/geom/eli/g_eli.c:542
#26 0xffffffff80b2dab4 in fork_exit (callout=0xffffffff84574b40 <g_eli_worker>, arg=0xfffff80030a69f40, frame=0xfffffe00ef93aac0) at /usr/src/sys/kern/kern_fork.c:1039
#27 <signal handler called>

(kgdb) frame
Stack level 18, frame at 0xfffffe00ef93a130:
 rip = 0xffffffff8451b169 in ccp_byteswap (/usr/src/sys/crypto/ccp/ccp_hardware.c:1320); saved rip = 0xffffffff84518e73
 inlined into frame 19, caller of frame at 0xfffffe00ef93a0f0
 source language c.
 Arglist at unknown address.
 Locals at unknown address, Previous frame's sp at 0xfffffe00ef93a0e0
 Saved registers:
  rax at 0xfffffe00ef93a060, rbx at 0xfffffe00ef93a068, rcx at 0xfffffe00ef93a048, rdx at 0xfffffe00ef93a040, rsi at 0xfffffe00ef93a038, rdi at 0xfffffe00ef93a030, rbp at 0xfffffe00ef93a070, r8 at 0xfffffe00ef93a050, r9 at 0xfffffe00ef93a058, r10 at 0xfffffe00ef93a078, r11 at 0xfffffe00ef93a080, r12 at 0xfffffe00ef93a088, r13 at 0xfffffe00ef93a090, r14 at 0xfffffe00ef93a098, r15 at 0xfffffe00ef93a0a0, rip at 0xfffffe00ef93a0c8, eflags at 0xfffffe00ef93a0d8, cs at 0xfffffe00ef93a0d0, ss at 0xfffffe00ef93a0e8
data = <optimized out>
len = <optimized out>
i = 0x2a738
t = 0xde

(kgdb) up
#19 ccp_collect_iv (s=0xfffffe00ed5425c0, crp=0xfffff801e2f0cc40, crd=0xfffff801e2f0ccc0) at /usr/src/sys/crypto/ccp/ccp_hardware.c:1395
1395                    ccp_byteswap(s->blkcipher.iv, s->blkcipher.iv_len);
(kgdb) p *s
$1 = {
  active = 0x0,
  cipher_first = 0x1,
  pending = 0xb594c446,
  mode = 99254362,
  queue = 0x8f1585f2,
    hmac = {
      auth_hash = 0xb9d3aa07d61f1878,
      hash_len = 0x440efbde,
      partial_digest_len = 0x1e4e1d87,
      auth_mode = 0x10000000,
      mk_size = 0x10000000,
      ipad =         "",
      opad =         "\320<\342o\270\272Z\344", '\066' <repeats 64 times>, "V)|\a\035\034@\364\272\070\274h\254\070\346\066?xU\366\315\002\350\v\304\215\356\016\035p\317\253\314\233\004D\335\212\326\210\263\316\262\220\216K=u}\276\267c\340\307\253\234"
    gmac = {
      hash_len = 0xd61f1878,
      final_block =         "\a\252\323\271\336\373\016D\207\035N\036"
  blkcipher = {
    cipher_mode = 0x58856ba,
    cipher_type = 0x8e30d0d2,
    key_len = 0x0,
    iv_len = 0x1000000,
    enckey =       "",
    iv =       ""

(kgdb) p *crp
$3 = {
  crp_next = {
    tqe_next = 0xdeadc0dedeadc0de,
    tqe_prev = 0xdeadc0dedeadc0de
  crp_task = {
    ta_link = {
      stqe_next = 0xdeadc0dedeadc0de
    ta_pending = 0xc0de,
    ta_priority = 0xdead,
    ta_func = 0xdeadc0dedeadc0de,
    ta_context = 0xdeadc0dedeadc0de
  crp_sid = 0x100000100000068,
  crp_ilen = 0x200,
  crp_olen = 0x1e0,
  crp_etype = 0x0,
  crp_flags = 0x40,
    crp_buf = 0xfffff801e2f0c800 "",
    crp_mbuf = 0xfffff801e2f0c800,
    crp_uio = 0xfffff801e2f0c800
  crp_opaque = 0xfffff80007b31000,
  crp_desc = 0xfffff801e2f0cd38,
  crp_callback = 0xffffffff84579ac0 <g_eli_auth_read_done>,
  crp_tstamp = {
    sec = 0xdeadc0dedeadc0de,
    frac = 0xdeadc0dedeadc0de
  crp_seq = 0xdeadc0de,
  crp_retw_id = 0x8

(kgdb) p *crd
$4 = {
  crd_skip = 0x20,
  crd_len = 0x1e0,
  crd_inject = 0xdeadc0de,
  crd_flags = 0x16,
  CRD_INI = {
    cri_alg = 0xb,
    cri_klen = 0x80,
    cri_mlen = 0xdeadc0de,
    cri_key = 0xfffff801b3c90d00 "D\374\360:8\031M\272\320V-\372\326 \377\230\a:\366\342\341",
    cri_iv =       "\263fi\277\022T\n",
    cri_next = 0xdeadc0dedeadc0de
  crd_next = 0x0

(kgdb) fram
Stack level 19, frame at 0xfffffe00ef93a130:
 rip = 0xffffffff8451b169 in ccp_collect_iv (/usr/src/sys/crypto/ccp/ccp_hardware.c:1395); saved rip = 0xffffffff84518e73
 called by frame at 0xfffffe00ef93a1f0, caller of frame at 0xfffffe00ef93a130
 source language c.
 Arglist at 0xfffffe00ef93a120, args: s=0xfffffe00ed5425c0, crp=0xfffff801e2f0cc40, crd=0xfffff801e2f0ccc0
 Locals at 0xfffffe00ef93a120, Previous frame's sp is 0xfffffe00ef93a130
 Saved registers:
  rbx at 0xfffffe00ef93a0f8, rbp at 0xfffffe00ef93a120, r12 at 0xfffffe00ef93a100, r13 at 0xfffffe00ef93a108, r14 at 0xfffffe00ef93a110, r15 at 0xfffffe00ef93a118, rip at 0xfffffe00ef93a128
s = 0xfffffe00ed5425c0
crp = 0xfffff801e2f0cc40
crd = 0xfffff801e2f0ccc0
No locals.
Comment 1 Conrad Meyer freebsd_committer 2018-05-05 01:51:07 UTC
I'm going to go ahead and take this since I probably have the most familiarity with the driver.  If anyone else wants to take a look before I get around to it (maybe this weekend, maybe not), you are welcome to.
Comment 2 Conrad Meyer freebsd_committer 2018-06-24 18:55:24 UTC
I think the problem is related to g_eli_integrity using the same session for all operations it performs.  There are N g_eli_worker threads for N CPU cores, but they should have independent sessions.

ccp_process does not globally lock during work submission today (it does lock over available queues) and uses global state (ccp_session buffers) which must survive for the duration of the request, not just submission.  This is violated even by serial submission in the same context, although that shouldn't lead to this.  ccp needs to allocate per-operation memory for operation-associated data rather than abusing the session to store it.

One other braindead / bogus thing ccp(4) does today is session allocation -- concurrent newsession + use of existing sid could race and a process thread could end up pointing at a freed session object.  I'm not sure that's what happened here but it could be.
Comment 3 Conrad Meyer freebsd_committer 2020-10-27 16:47:48 UTC
I'm not working on this, releasing to pool.

Tl;dr ccp(4) does not handle concurrent operations on a session correctly and is therefore generally unsafe to use.  Also, the hardware backing it is extremely slow (at least in Zen1).