Bug 253595 - ccp(4) breaks ZFS
Summary: ccp(4) breaks ZFS
Status: Closed DUPLICATE of bug 252981
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-02-17 19:57 UTC by Johnny Sorocil
Modified: 2021-02-25 16:24 UTC (History)
2 users (show)

See Also:


Attachments
core.txt (76.17 KB, text/plain)
2021-02-17 19:57 UTC, Johnny Sorocil
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Johnny Sorocil 2021-02-17 19:57:23 UTC
Created attachment 222535 [details]
core.txt

Loading ccp (either in rc.conf's kld_list or manually kldloading module after boot) breaks ZFS encryption - I can't load keys for existing dataset and creating new one results in kernel panic.

Try to load ZFS dataset key
% kldload ccp
% zfs load-key data
Enter passphrase for 'data':
Key load error: Incorrect key provided for 'data'.
Enter passphrase for 'data':
Key load error: Incorrect key provided for 'data'.
Enter passphrase for 'data':
Key load error: Incorrect key provided for 'data'.
zsh: exit 255   zfs load-key data

One way to reproduce kernel panic:
truncate -s 10G pool
mdconfig -at vnode -f pool
zpool create -m /mnt/test -O compress=lz4 -O atime=off -O devices=off -O setuid=off -O exec=off -O encryption=on -O keyformat=passphrase test /dev/md0
<kernel panic>

Other way to reproduce kernel panic:
Try to create encrypted partition on existing pool (doesn't matter if root of the pool is encrypted or not):
zfs create -o encryption=on -o keyformat=passphrase zroot/encrypted
<kernel panic>

% cat /var/crash/info.last
Dump header from device: /dev/gpt/hdd-swap
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 1346650112
  Blocksize: 512
  Compression: none
  Dumptime: 2021-02-17 20:47:17 +0100
  Hostname: zen-pobro
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 13.0-BETA2 #2 r13.0-n244512-726e20f45041: Wed Feb 17 20:26:38 CET 2021
    root@zen-pobro:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
  Panic String: VERIFY3(0 == zio_crypt_key_wrap(&dck->dck_wkey->wk_key, key, iv, mac, keydata, hmac_keydata)) failed (0 == 5)

  Dump Parity: 2673242901
  Bounds: 4
  Dump Status: good

% dmesg
...
CPU: AMD Ryzen 7 PRO 4750G with Radeon Graphics (3593.33-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x860f01  Family=0x17  Model=0x60  Stepping=1
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x7ed8320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
  AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
  AMD Features2=0x75c237ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX,ADMSKX>
  Structured Extended Features=0x219c91a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,PQE,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA>
  Structured Extended Features2=0x400004<UMIP,RDPID>
  XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
  AMD Extended Feature Extensions ID EBX=0x90cf757<CLZERO,IRPerf,XSaveErPtr,RDPRU,MCOMMIT,WBNOINVD,IBPB,IBRS,STIBP,PREFER_IBRS,SSBD>
  SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=32768
  TSC: P-state invariant, performance statistics
...
ccp0: <AMD CCP-5a> mem 0xfcc00000-0xfccfffff,0xfcd8c000-0xfcd8dfff at device 0.2 on pci9
random: registering fast source AMD CCP TRNG

% pciconf -lv
none2@pci0:9:0:2:       class=0x108000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x15df subvendor=0x1022 subdevice=0x15df
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Family 17h (Models 10h-1fh) Platform Security Processor'
    class      = encrypt/decrypt


Reproduced on FreeBSD 13.0-ALPHA3, 13.0-BETA2 and 14.0-CURRENT (commit 4a7d84058d Wed Feb 17 11:45:54 2021 +0100)

If ccp module is not loaded:
% zfs load-key data
Enter passphrase for 'data':
<ZFS dataset decrypted>

% zfs create -o encryption=on -o keyformat=passphrase zroot/encrypted
<new encrypted ZFS dataset created without panic>
Comment 1 Mark Johnston freebsd_committer 2021-02-17 20:59:08 UTC
ccp(4) appears to have a constraint that the AAD length with AES-GCM must be a multiple of the cipher block size.  ZFS doesn't handle the errors that result when it submits a request satisfying this constraint.  See bug 252981 for a related example of the same problem.  For 13.0 this will be worked around by simply disabling the use of hardware offloads by ZFS.
Comment 2 Mark Johnston freebsd_committer 2021-02-17 21:00:27 UTC
> when it submits a request satisfying this constraint
                           ^ not
Comment 3 Conrad Meyer freebsd_committer 2021-02-18 01:25:17 UTC
Also, there's no reason to use ccp(4).  It's broken (bug 227982) and slower than aesni(4).
Comment 4 Mark Johnston freebsd_committer 2021-02-18 15:17:12 UTC
(In reply to Conrad Meyer from comment #3)
What do you think should we do with it?  I think most in-kernel consumers assume that you can have multiple requests in flight in a session, and without that it's hard if not impossible to get decent throughput from a hardware offload device.  I'm not sure whether that's a limitation of the driver or the device though.

Perhaps we could simply change ccp(4) to not register itself with OCF for now.
Comment 5 commit-hook freebsd_committer 2021-02-22 17:44:15 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=940415f20a784156ec0e247989796385896f32a8

commit 940415f20a784156ec0e247989796385896f32a8
Author:     Martin Matuska <mm@FreeBSD.org>
AuthorDate: 2021-02-22 17:37:47 +0000
Commit:     Martin Matuska <mm@FreeBSD.org>
CommitDate: 2021-02-22 17:42:33 +0000

    zfs: disable use of hardware crypto offload drivers

    From openzfs-master e7adccf7f commit message:
      First, the crypto request completion handler contains a bug in that it
      fails to reset fs_done correctly after the request is completed.  This
      is only a problem for asynchronous drivers.  Second, some hardware
      drivers have input constraints which ZFS does not satisfy.  For
      instance, ccp(4) apparently requires the AAD length for AES-GCM to be a
      multiple of the cipher block size, and with qat(4) the AES-GCM AAD
      length may not be longer than 240 bytes.  FreeBSD's generic crypto
      framework doesn't have a mechanism to automatically fall back to a
      software implementation if a hardware driver cannot process a request,
      and ZFS does not tolerate such errors.

    Patch Author:   Mark Johnston <markj@freebsd.org>

    Obtained from:  openzfs/zfs@e7adccf7f537a4d07281a2b74b360154bae367bc
    PR:             252981, 253595
    MFS after:      3 days

    (direct commit)

 sys/contrib/openzfs/module/os/freebsd/zfs/crypto_os.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)
Comment 6 crest 2021-02-23 10:38:53 UTC
Does this also disable the use of AES-NI and carryless multiply or does ZFS make use of accelerated software crypto directly?
I'm asking because disabling AES-NI would really hurt the vast majority of potential ZFS encryption users and expose them to timing side channels present in pure software AES and GCM implementations optimized for speed?
Comment 7 Mark Johnston freebsd_committer 2021-02-23 14:08:55 UTC
(In reply to crest from comment #6)
No, aesni will still be used by the ZFS encryption layer if available (same for armv8crypto).  The change applies only to hardware offload drivers.
Comment 8 commit-hook freebsd_committer 2021-02-25 16:21:13 UTC
A commit in branch releng/13.0 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=442719c0c6de93051d4bf9820420e9863ed3de53

commit 442719c0c6de93051d4bf9820420e9863ed3de53
Author:     Martin Matuska <mm@FreeBSD.org>
AuthorDate: 2021-02-22 17:37:47 +0000
Commit:     Martin Matuska <mm@FreeBSD.org>
CommitDate: 2021-02-25 16:20:20 +0000

    zfs: disable use of hardware crypto offload drivers

    From openzfs-master e7adccf7f commit message:
      First, the crypto request completion handler contains a bug in that it
      fails to reset fs_done correctly after the request is completed.  This
      is only a problem for asynchronous drivers.  Second, some hardware
      drivers have input constraints which ZFS does not satisfy.  For
      instance, ccp(4) apparently requires the AAD length for AES-GCM to be a
      multiple of the cipher block size, and with qat(4) the AES-GCM AAD
      length may not be longer than 240 bytes.  FreeBSD's generic crypto
      framework doesn't have a mechanism to automatically fall back to a
      software implementation if a hardware driver cannot process a request,
      and ZFS does not tolerate such errors.

    Patch Author:   Mark Johnston <markj@freebsd.org>

    Obtained from:  openzfs/zfs@e7adccf7f537a4d07281a2b74b360154bae367bc
    PR:             252981, 253595
    Approved by:    re (gjb)

    (cherry picked from commit 940415f20a784156ec0e247989796385896f32a8)

 sys/contrib/openzfs/module/os/freebsd/zfs/crypto_os.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)
Comment 9 Mark Johnston freebsd_committer 2021-02-25 16:24:18 UTC

*** This bug has been marked as a duplicate of bug 252981 ***