After booting stable/12 r341604, running a custom kernel including cxgbe(4), cxgbev(4), and ccr(4), and running swapon -a in SU mode:
GEOM_ELI: Device gpt/swap0.eli created.
GEOM_ELI: Encryption: AES-XTS 256
GEOM_ELI: Crypto: hardware
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address = 0x0
fault code = supervisor write data, page not present
instruction pointer = 0x20:0xffffffff805be1b2
stack pointer = 0x28:0xfffffe00a6253770
frame pointer = 0x28:0xfffffe00a6253770
code segment = base rx0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 65 (g_eli gpt/swap0)
trap number = 12
panic: page fault
cpuid = 3
time = 1544087308
KDB: stack backtrace:
db_trace_self_wrapper() at 0xffffffff8055d9eb = db_trace_self_wrapper+0x2b/frame 0xfffffe00a6253420
vpanic() at 0xffffffff808754d3 = vpanic+0x1a3/frame 0xfffffe00a6253480
panic() at 0xffffffff80875323 = panic+0x43/frame 0xfffffe00a62534e0
trap_fatal() at 0xffffffff80bd745f = trap_fatal+0x35f/frame 0xfffffe00a6253530
trap_pfault() at 0xffffffff80bd74b9 = trap_pfault+0x49/frame 0xfffffe00a4b7f590
trap() at 0xffffffff80bd6ade = trap+0x29e/frame 0xfffffe00a4b7f6a0
calltrap() at 0xffffffff80bb3935 = calltrap+0x8/frame 0xfffffe00a4b7f6a0
--- trap 0xc, rip = 0xffffffff805be1b2, rsp = 0xfffffe00a4b7f770, rbp = 0xfffffe00a4b7f770
t4_wrq_tx_locked() at 0xffffffff805be1b2 = t4_wrq_tx_locked+0x12/frame 0xfffffe00a6253770
ccr_process() at 0xffffffff805e3fc3 = ccr_process+0x1953/frame 0xfffffe00a4b7f970
crypto_dispatch() at 0xffffffff80ae3fa4 = crypto_dispatch+0x144/frame 0xfffffe00a4b7f9b0
g_eli_crypto_run() at 0xffffffff807b5cb3 = g_eli_crypto_run+0x273/frame 0xfffffe00a4b7fa10
g_eli_worker() at 0xffffffff807aebc8 = g_eli_worker+0x3c8/frame 0xfffffe00a4b7fa70
fork_exit() at 0xffffffff80834d93 = fork_exit+0x83/frame 0xfffffe00a4b7fab0
fork_trampoline() at 0xffffffff80bb491e = fork_trampoline+0xe/frame 0xfffffe00a4b7fab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Automatic reboot in 15 seconds - press a key on the console to abort
--> Press a key on the console to reboot,
The Chelsio NIC is a T6225-CR.
Kernel config is https://ximalas.info/~trond/create-zfs/canmount/ENTERPRISE-amd64-stable-12
I tried stable/12 r341623 with ccr(4) removed from the kernel, and there was no problems when I engaged geli encrypted swap in SU mode.
That was yesterday.
Today, while doing the finishing touches after upgrading to stable/12 r341676, and still in SU mode, I ran the following sequence:
killall -TERM moused
killall -TERM devd
zfs umount -a
To my amazement, I saw the crypto accelerator being loaded and recognized. Clearly, the system didn't panic, but maybe it would have later on, leading me to believe the crypto or the cxbge/ccr subsystem isn't properly initialized early after startup. Please prove me wrong.
I rebooted to SU mode to start fresh, and ran this sequence:
I rebooted again to SU mode, and ran this simpler sequence:
In both cases the kernel panicked as soon as I executed the final command.
Adding np@ and jhb@ to the CC list since they are the authors of cxbge(4) and ccr(4).
Currently ccr shares queues with cc0 and needs 'ifconfig cc0 up' before it will work. We were planning to fix that before now but haven't. I thought I had added a safety belt for that, but clearly it's only a TODO in the source. :-/
I have a patch for review in https://reviews.freebsd.org/D18478. This isn't the long term fix, but will avoid panicking. In your case it will fallback to software crypto when doing the swapon -a.
(In reply to John Baldwin from comment #2)
Thanks, John. That might explain why it succeeded once. I usually run https://ximalas.info/~trond/create-zfs/canmount/single-user-mode.sh.txt from / when I enter SU mode, to save some typing. I'll change my script to enable cc0 before activating the swap device(s). Unfortunately, the machine in question is at work. I might be able to try out your suggestion sometime during the weekend.
(In reply to Trond.Endrestol from comment #3)
After some thinking, this startup script, named /etc/rc.d/ifconfig_cc0_up_before_swap, seems appropriate for my case, at least until t4_crypto.c is updated in head, and later in stable/12. Running rcorder manually places this script near the top of the list, well before /etc/rc.d/swap. I'll try to pop in at $WORK on Sunday to give the script a spin. The alternative, I figured, is to run "kldload ccr" in /etc/rc.local or some other script placed in either /etc/rc.d or /usr/local/etc/rc.d. I opted for covering the case where ccr(4) is part of the static kernel.
# PROVIDE: ifconfig_cc0_up_before_swap
# REQUIRE: geli
# BEFORE: swap
# in /etc/rc.conf.
desc="Ensure interface cc0 is up before utilizing any crypto(4) services handled by ccr(4), including geli(8) encrypted swap devices"
start_cmd='/sbin/ifconfig cc0 up || true' # Any errors are usually tolerable.
(In reply to Trond.Endrestol from comment #4)
Shouldn't that run BEFORE geli too?
(In reply to Conrad Meyer from comment #5)
Before I changed anything, the beginning of the output of rcorder /etc/rc.d/* went like this:
On inspection of /etc/rc.d/geli, it actually provides disks, I have thus added disks to BEFORE instead of geli and removed REQUIRE.
Now the output of rcorder /etc/rc.d/* begins like this:
The output remains unchanged, but the handling of my script is probably more robust. Thank you for your input.
(In reply to Trond.Endrestol from comment #6)
Both scripts are very successful. The system is now at r341794. I incorporated jkim@'s patch for OpenSSL's devcrypto engine, see https://lists.freebsd.org/pipermail/freebsd-stable/2018-December/090203.html
I'm not sure if ccr4() is an improvement over AES-NI, I'm only curious to see what the T6225-CR NIC can do for me. OpenSSL/OpenSSH should be able to utilize the crypto accelerator:
# openssl version
OpenSSL 1.1.1a-freebsd 20 Nov 2018
# openssl engine -t -v
(devcrypto) /dev/crypto engine
[ available ]
(dynamic) Dynamic engine loading support
[ unavailable ]
SO_PATH, NO_VCHECK, ID, LIST_ADD, DIR_LOAD, DIR_ADD, LOAD
# fuser /dev/crypto
/dev/crypto: 7314w 7313w 7308w 6999w 6998w 5658w 5398w 5331w 5329w 5326w 5325w 5324w 5323w 5248w 2522w 1196w 1193w 1190w 1164w 1147w 1146w 1145w
(In reply to Trond.Endrestol from comment #7)
So the /dev/crypto engine in OpenSSL is a bit limited in what it can do. The engine in OpenSSL 1.0.2 only supported a few ciphers like AES-CBC and older SHA1 and SHA2/256 digests. It did not support AES-GCM, etc. I had patches to 1.0.2 to add GCM support as a proof of concept, though I was focused on HTTPS testing rather than SSH. I haven't tested the 1.1.1 engine yet, but when I looked at the code in 1.1.x a few months ago it was even more limited in terms of what ciphers it supported than 1.0.2. It certainly did not support offloading AES-GCM. So, I currently would not expect OpenSSH in FreeBSD to use /dev/crypto. Web servers like apache and nginx will not make use of /dev/crypto either with the current engine. I would like to eventually upstream support for AES-GCM and TLS to OpenSSL 1.1.x so that they are not local patches in the FreeBSD version, but I haven't circled back to working on that.
The reason OpenSSH is probably opening /dev/crypto currently is to try to offload the RSA operations during connection setup, but the requests are probably failing and it is still doing RSA in software.
Places where I expect ccr(4) to be useful currently are in-kernel crypto acceleration for things like GELI and IPSec.
(In reply to John Baldwin from comment #8)
At least my encrypted swap devices will be served by ccr(4). :-) But after I set vm.pageout_update_period to 0 a couple of weeks ago, the swap devices doesn't see much action. Thank you to everyone involved.
A commit references this bug:
Date: Tue Jan 15 18:53:45 UTC 2019
New revision: 343056
Reject new sessions if the necessary queues aren't initialized.
ccr reuses the control queue and first rx queue from the first port on
each adapter. The driver cannot send requests until those queues are
initialized. Refuse to create sessions for now if the queues aren't
ready. This is a workaround until cxgbe allocates one or more
dedicated queues for ccr.
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D18478