Bug 262894

Summary:

Kernel Panic (page fault) with 13.1-BETA2 in g_eli & httpd

Product:

Base System

Reporter:

Krautmaster <mathias.kraut>

Component:

kern

Assignee:

Alexander Motin <mav>

Status:

Closed FIXED

Severity:

Affects Only Me

CC:

asomers, grahamperrin, markj, mav

Priority:

---

Version:

13.1-RELEASE

Hardware:

amd64

OS:

Any

Attachments:

Description	Flags
crash dumps	none

Description Krautmaster 2022-03-28 17:15:46 UTC

Created attachment 232788 [details]
crash dumps

updated from FreeBSD 12 to 13.1. 
Running in a HyperV virtual machine on a Xeon D 1518. 32GB ECC Memory. Passing a HBA to the VM (LSI).

13.1 B2 crashing ~1x a day especially on moderate load. 
12 running stable for years.

HW, VM host and tunables unchanged. 

I wonder a bit because many seem to have panics in FreeBSD 13 and I also have one in httpd and two in geli within three days. See the /data/crash  dumps attached.

Thanks for your help

Example:

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address	= 0xfffff80e00000004
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80f1c50d
stack pointer	        = 0x28:0xfffffe0144d3cc00
frame pointer	        = 0x28:0xfffffe0144d3cca0
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 2653 (g_eli[2] gptid/e2d4)
trap number		= 12
panic: page fault
cpuid = 2
time = 1648393340
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0144d3c9c0
vpanic() at vpanic+0x17f/frame 0xfffffe0144d3ca10
panic() at panic+0x43/frame 0xfffffe0144d3ca70
trap_fatal() at trap_fatal+0x385/frame 0xfffffe0144d3cad0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0144d3cb30
calltrap() at calltrap+0x8/frame 0xfffffe0144d3cb30
--- trap 0xc, rip = 0xffffffff80f1c50d, rsp = 0xfffffe0144d3cc00, rbp = 0xfffffe0144d3cca0 ---
aesni_crypt_xts() at aesni_crypt_xts+0x17d/frame 0xfffffe0144d3cca0
aesni_decrypt_xts() at aesni_decrypt_xts+0xe/frame 0xfffffe0144d3ccc0
aesni_cipher_crypt() at aesni_cipher_crypt+0x2f1/frame 0xfffffe0144d3cd70
aesni_process() at aesni_process+0x159/frame 0xfffffe0144d3cdc0
crypto_dispatch() at crypto_dispatch+0x118/frame 0xfffffe0144d3cdf0
g_eli_crypto_run() at g_eli_crypto_run+0x178/frame 0xfffffe0144d3ce90
g_eli_worker() at g_eli_worker+0x328/frame 0xfffffe0144d3cef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe0144d3cf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0144d3cf30
--- trap 0x80af5f94, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

Comment 1 Mark Johnston freebsd_committer

2022-03-28 17:29:56 UTC

- Any chance you can get a full vmcore for this panic?  The AES code in question hasn't changed in a long time, so this is more likely a bug in opencrypto or GELI and so will be harder to debug.
- Are you able to confirm whether this happens in 13.0?
- Are you able to test kernel patches?  Based on the fault address I'm wondering a bit if this is related to handling of unmapped I/O in GELI, which was added in 13 and doesn't seem to be present in 12.  We don't currently have a chicken switch to disable it, so a patch would be needed to test that theory.

Comment 2 Krautmaster 2022-03-28 17:42:13 UTC

hmm not that easy to provide the same. I'm using Truenas 12/13 so if someone builds me a kernel for I can swap them out. 

A second issue not booting with the LSI was preset in this bug 
https://jira.ixsystems.com/browse/NAS-115143

which also came with this https://github.com/truenas/os/commit/5bd08f719f778648e4383f5e4d07bd384047a310

this bug could be reproduced with the standard images of freeBSD 13/14 but the one from that page fault is pretty hard.

As it occurs quite often I could use a testkernel and turntables of your choice for some days to see if it occurs again, if you got a better idea... let me know

I could also see some reports having panics with Network stuff and Jails ins 13.1, maybe is that IO thing involved there as well. Will ask the guys form that other issue if its possible to test the same with a 13.0 kernel.

best regards

Comment 3 Krautmaster 2022-03-28 18:01:52 UTC

that contact form that HBA bug offered me to provide 13.1 Kernels with specific patches if required.

Comment 4 Krautmaster 2022-03-30 15:51:36 UTC

@Mark Johnston 
I can report that the same machine back with 12.2 runs fine again, any load, no crash so far (like it did before the 13.1 update). Wonder what may have caused the httpd and geli page fault, maybe the same root cause.

See the other crash in the row of three in the attachment of this ticket

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address	= 0x10
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80ac5384
stack pointer	        = 0x28:0xfffffe015bedbc80
frame pointer	        = 0x28:0xfffffe015bedbcc0
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 37468 (httpd)
trap number		= 12
panic: page fault
cpuid = 0
time = 1648160238
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe015bedba40
vpanic() at vpanic+0x17f/frame 0xfffffe015bedba90
panic() at panic+0x43/frame 0xfffffe015bedbaf0
trap_fatal() at trap_fatal+0x385/frame 0xfffffe015bedbb50
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe015bedbbb0
calltrap() at calltrap+0x8/frame 0xfffffe015bedbbb0
--- trap 0xc, rip = 0xffffffff80ac5384, rsp = 0xfffffe015bedbc80, rbp = 0xfffffe015bedbcc0 ---
kqueue_drain() at kqueue_drain+0x134/frame 0xfffffe015bedbcc0
kqueue_close() at kqueue_close+0x42/frame 0xfffffe015bedbd10
_fdrop() at _fdrop+0x11/frame 0xfffffe015bedbd30
closef() at closef+0x24b/frame 0xfffffe015bedbdc0
closefp() at closefp+0x80/frame 0xfffffe015bedbe00
amd64_syscall() at amd64_syscall+0x10c/frame 0xfffffe015bedbf30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe015bedbf30
--- syscall (6, FreeBSD ELF64, sys_close), rip = 0x8007f591a, rsp = 0x7fffffffe808, rbp = 0x7fffffffe820 ---
KDB: enter: panic

Comment 5 Mark Johnston freebsd_committer

2022-03-30 21:33:15 UTC

(In reply to Krautmaster from comment #4)
Let's focus on the GELI panic first.  The fault address is 0xfffff80e00000004, which is in the direct map range, but with 32GB of RAM the direct map should end roughly at 0xfffff80800000000.  Maybe there is some large discontiguity.

Some questions:
- Which HBA driver is used?
- Can you please attach output from "sysctl vm.pmap.kernel_maps"?
- Do any of your GELI volumes use authentication?  "geli list | grep AuthenticationAlgorithm" will print some output if so.  If not, please try the patch below.

diff --git a/sys/geom/eli/g_eli.c b/sys/geom/eli/g_eli.c
index 4978523cbebe..9698cb2fdbc8 100644
--- a/sys/geom/eli/g_eli.c
+++ b/sys/geom/eli/g_eli.c
@@ -1137,6 +1137,7 @@ g_eli_create(struct gctl_req *req, struct g_class *mp, struct g_provider *bpp,
         */
        pp = g_new_providerf(gp, "%s%s", bpp->name, G_ELI_SUFFIX);
        pp->flags |= G_PF_DIRECT_SEND | G_PF_DIRECT_RECEIVE;
+#if 0
        if (CRYPTO_HAS_VMPAGE) {
                /*
                 * On DMAP architectures we can use unmapped I/O.  But don't
@@ -1146,6 +1147,7 @@ g_eli_create(struct gctl_req *req, struct g_class *mp, struct g_provider *bpp,
                 if ((sc->sc_flags & G_ELI_FLAG_AUTH) == 0)
                        pp->flags |= G_PF_ACCEPT_UNMAPPED;
        }
+#endif
        pp->mediasize = sc->sc_mediasize;
        pp->sectorsize = sc->sc_sectorsize;
        LIST_FOREACH(gap, &bpp->aliases, ga_next)

Comment 6 Krautmaster 2022-03-31 10:52:15 UTC

(In reply to Mark Johnston from comment #5)

thanks, i'm currently on freebsd 12 as it runs, but these outputs are from 13

lspci -vvv  https://pastebin.com/cUNj8xBe
devinfo -vr https://pastebin.com/czp6eKhd
dmesg https://pastebin.com/vDdhw4M4

sysctl vm.pmap.kernel_maps
https://controlc.com/269784b5
large output

geli list | grep AuthenticationAlgorithm
-> nothing 


Im currently waiting for the new daily build as a patch required for a working HBA was commited yesterday
https://cgit.freebsd.org/src/commit/?id=5473dee7300507de64c2e6c140b87c9bde8e4462

as soon as it is there ill update to Truenas 13 again and maybe Alex will able to privide me that patched version or kernel or whatever is needed to test your patch. Guess i cant patch it by my own

Comment 7 Mark Johnston freebsd_committer

2022-03-31 16:03:07 UTC

(In reply to Krautmaster from comment #6)
The beginning of the sysctl output is truncated.

Comment 8 Krautmaster 2022-03-31 16:12:33 UTC

(In reply to Mark Johnston from comment #7)
sry here we go https://controlc.com/a2c12826

Comment 9 Mark Johnston freebsd_committer

2022-03-31 17:01:37 UTC

(In reply to Krautmaster from comment #8)
So the direct map indeed ends at 0xfffff80800000000 and the fault address is rather strange.  Is it always the same?  That is, do all aesni panics contain the line:

fault virtual address	= 0xfffff80e00000004

?

Comment 10 Krautmaster 2022-03-31 17:14:34 UTC

(In reply to Mark Johnston from comment #9)
both geli crashes do, the third one does not contain an address.

The crashes only appear in FreeBSD 13, the output of above is from FreeBSD 12 as said but I guess its the same

Comment 11 Krautmaster 2022-03-31 17:17:00 UTC

third I mean the httpd crash. I may see if it stays on fault virtual address	= 0xfffff80e00000004 once the new 13 nightly is out then ill wait for more crashes. But it might be already strange that both geli have the same faulty address

Comment 12 Krautmaster 2022-04-01 20:29:35 UTC

this afternoon i booted Freebsd by fault after a short power issue and started a larger copy again (geli drive to normal unencrypted one).

System was running without your provided patch and crashed again. This time different again (like httpd and 2x geli)
Anyhow this makes me think that the quite common panics in 13 might have a same root cause?

See the panic below.

Additionally, I was able to get a patched kernel up and running now and will see if this is more stable in freeBSD 13.1


panic: Solaris(panic): Samsung8TB: blkptr at 0xfffff803dacbb1a8 has invalid CHECKSUM 0
cpuid = 1
time = 1648837251
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe016aa86830
vpanic() at vpanic+0x17f/frame 0xfffffe016aa86880
panic() at panic+0x43/frame 0xfffffe016aa868e0
vcmn_err() at vcmn_err+0xeb/frame 0xfffffe016aa86a10
zfs_panic_recover() at zfs_panic_recover+0x59/frame 0xfffffe016aa86a70
zfs_blkptr_verify_log() at zfs_blkptr_verify_log+0xa3/frame 0xfffffe016aa86c00
zfs_blkptr_verify() at zfs_blkptr_verify+0xaa/frame 0xfffffe016aa86c60
zio_free() at zio_free+0x26/frame 0xfffffe016aa86ca0
dsl_dataset_block_kill() at dsl_dataset_block_kill+0x18d/frame 0xfffffe016aa86d10
dbuf_write_done() at dbuf_write_done+0x4f/frame 0xfffffe016aa86d50
arc_write_done() at arc_write_done+0x314/frame 0xfffffe016aa86d90
zio_done() at zio_done+0x82a/frame 0xfffffe016aa86e00
zio_execute() at zio_execute+0x9f/frame 0xfffffe016aa86e40
taskqueue_run_locked() at taskqueue_run_locked+0x181/frame 0xfffffe016aa86ec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe016aa86ef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe016aa86f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe016aa86f30
--- trap 0x80af6074, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

Comment 13 Krautmaster 2022-04-01 22:45:58 UTC

while it crashed before within minutes after some load on the encrypted ZFS volume, it copies now since ~4h at high IO without an issue. Maybe too early for a  final statement, but that "unmapped I/O in GELI" could be the case so your feeling might be right. Will keep the machine online for as long as needed and see if it fails the next days. Also checked the data/crash folder on FreeBSD 12, not a single entry and four within barely a week in FreeBSD 13.1.
 
Thanks for the fast support. Will keep updating on that patched Kernel

Comment 14 Krautmaster 2022-04-05 04:46:04 UTC

okay first of all, i seem not to get any geli panics any more and the system was stable so far - till this night during scrub. There seems to be an other issue, related to ZFS or memory or no idea if this has the same root cause

panic: Solaris(panic): Samsung8TB: blkptr at 0xfffff803dacbb1a8 has invalid CHECKSUM 0
cpuid = 1
time = 1648837251
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe016aa86830
vpanic() at vpanic+0x17f/frame 0xfffffe016aa86880
panic() at panic+0x43/frame 0xfffffe016aa868e0
vcmn_err() at vcmn_err+0xeb/frame 0xfffffe016aa86a10
zfs_panic_recover() at zfs_panic_recover+0x59/frame 0xfffffe016aa86a70
zfs_blkptr_verify_log() at zfs_blkptr_verify_log+0xa3/frame 0xfffffe016aa86c00
zfs_blkptr_verify() at zfs_blkptr_verify+0xaa/frame 0xfffffe016aa86c60
zio_free() at zio_free+0x26/frame 0xfffffe016aa86ca0
dsl_dataset_block_kill() at dsl_dataset_block_kill+0x18d/frame 0xfffffe016aa86d10
dbuf_write_done() at dbuf_write_done+0x4f/frame 0xfffffe016aa86d50
arc_write_done() at arc_write_done+0x314/frame 0xfffffe016aa86d90
zio_done() at zio_done+0x82a/frame 0xfffffe016aa86e00
zio_execute() at zio_execute+0x9f/frame 0xfffffe016aa86e40
taskqueue_run_locked() at taskqueue_run_locked+0x181/frame 0xfffffe016aa86ec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe016aa86ef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe016aa86f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe016aa86f30
--- trap 0x80af6074, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

Comment 15 Krautmaster 2022-04-05 04:47:49 UTC

ah sorry the posted panic was the old one, thats the correct one

panic: bad pte va fffff802e3374030 pte 0
cpuid = 7
time = 1649114378
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01656c2690
vpanic() at vpanic+0x17f/frame 0xfffffe01656c26e0
panic() at panic+0x43/frame 0xfffffe01656c2740
pmap_remove_pages() at pmap_remove_pages+0x92f/frame 0xfffffe01656c2890
exec_new_vmspace() at exec_new_vmspace+0x223/frame 0xfffffe01656c28f0
exec_elf64_imgact() at exec_elf64_imgact+0xb16/frame 0xfffffe01656c29f0
kern_execve() at kern_execve+0x77d/frame 0xfffffe01656c2d70
sys_execve() at sys_execve+0x5a/frame 0xfffffe01656c2e00
amd64_syscall() at amd64_syscall+0x10c/frame 0xfffffe01656c2f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01656c2f30
--- syscall (59, FreeBSD ELF64, sys_execve), rip = 0x8007a3faa, rsp = 0x7fffdfbfb738, rbp = 0x7fffdfbfb740 ---
KDB: enter: panic

Comment 16 Krautmaster 2022-04-08 15:49:09 UTC

by fault  booted the wrong Kernel (nightly update) yesterday again, and just now O got the same geli issue with the same faulty address.

Fatal trap 12: page fault while in kernel mode
cpuid = 4; apic id = 04
fault virtual address	= 0xfffff80e00000004
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80f1cd7d
stack pointer	        = 0x28:0xfffffe014e59ec00
frame pointer	        = 0x28:0xfffffe014e59eca0
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 2705 (g_eli[4] gptid/e567)
trap number		= 12
panic: page fault
cpuid = 4
time = 1649430252
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe014e59e9c0
vpanic() at vpanic+0x17f/frame 0xfffffe014e59ea10
panic() at panic+0x43/frame 0xfffffe014e59ea70
trap_fatal() at trap_fatal+0x385/frame 0xfffffe014e59ead0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe014e59eb30
calltrap() at calltrap+0x8/frame 0xfffffe014e59eb30
--- trap 0xc, rip = 0xffffffff80f1cd7d, rsp = 0xfffffe014e59ec00, rbp = 0xfffffe014e59eca0 ---
aesni_crypt_xts() at aesni_crypt_xts+0x17d/frame 0xfffffe014e59eca0
aesni_decrypt_xts() at aesni_decrypt_xts+0xe/frame 0xfffffe014e59ecc0
aesni_cipher_crypt() at aesni_cipher_crypt+0x2f1/frame 0xfffffe014e59ed70
aesni_process() at aesni_process+0x159/frame 0xfffffe014e59edc0
crypto_dispatch() at crypto_dispatch+0x118/frame 0xfffffe014e59edf0
g_eli_crypto_run() at g_eli_crypto_run+0x178/frame 0xfffffe014e59ee90
g_eli_worker() at g_eli_worker+0x328/frame 0xfffffe014e59eef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe014e59ef30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe014e59ef30
--- trap 0x80af60b4, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

Comment 17 Mark Johnston freebsd_committer

2022-04-08 16:39:36 UTC

(In reply to Krautmaster from comment #16)
Ok, so this suggests that the GELI panics at least are triggered by the unmapped I/O support.  It'll be hard to debug further without a crash dump; is there any reason you can't set one up (e.g., by adding a new virtual disk and setting it as the dump device ("dumpdev" in /etc/rc.conf))?

Comment 18 Krautmaster 2022-04-08 19:34:40 UTC

I can do so and report.

Comment 19 Krautmaster 2022-04-09 08:04:06 UTC

sadly I was not able to configure the full dump so far. Attached a new 50GB disk to the machine, created a single disk pool on it, tried to edit the rc.conf but it seem to get overwritten on each start. I also tried to set dumdev path via tunables in the Truenas GUI, but after my new crash this morning on stock kernel it was empty. 

Need some help to configure this full dumpdev, maybe i need to mount its in an other FS, beside the Truenas thing and maybe I have to configure the dumpdev anywhere else? 

Thanks for the help

A list of the normal dumps Truenas does can be found here

https://1drv.ms/u/s!Ar_eIBtD4lGqicUg1WusuXtlKnjS_g?e=C1gwBk

Comment 20 Krautmaster 2022-04-10 11:54:54 UTC

(In reply to Mark Johnston from comment #17)

next crash. So far the dumpdev drive seem not to work using "rc" tunables in Truenas. But I can offer direct device access to configure whatever needed

Fatal trap 12: page fault while in kernel mode
cpuid = 7; apic id = 07
fault virtual address	= 0xfffff80b00000004
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80f1cdcd
stack pointer	        = 0x28:0xfffffe014e5fbc00
frame pointer	        = 0x28:0xfffffe014e5fbca0
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 3648 (g_eli[7] gptid/e539)
trap number		= 12
panic: page fault
cpuid = 7
time = 1649591029
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe014e5fb9c0
vpanic() at vpanic+0x17f/frame 0xfffffe014e5fba10
panic() at panic+0x43/frame 0xfffffe014e5fba70
trap_fatal() at trap_fatal+0x385/frame 0xfffffe014e5fbad0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe014e5fbb30
calltrap() at calltrap+0x8/frame 0xfffffe014e5fbb30
--- trap 0xc, rip = 0xffffffff80f1cdcd, rsp = 0xfffffe014e5fbc00, rbp = 0xfffffe014e5fbca0 ---
aesni_crypt_xts() at aesni_crypt_xts+0x17d/frame 0xfffffe014e5fbca0
aesni_decrypt_xts() at aesni_decrypt_xts+0xe/frame 0xfffffe014e5fbcc0
aesni_cipher_crypt() at aesni_cipher_crypt+0x2f1/frame 0xfffffe014e5fbd70
aesni_process() at aesni_process+0x159/frame 0xfffffe014e5fbdc0
crypto_dispatch() at crypto_dispatch+0x118/frame 0xfffffe014e5fbdf0
g_eli_crypto_run() at g_eli_crypto_run+0x178/frame 0xfffffe014e5fbe90
g_eli_worker() at g_eli_worker+0x328/frame 0xfffffe014e5fbef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe014e5fbf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe014e5fbf30
--- trap 0x80af60b4, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

Comment 21 Mark Johnston freebsd_committer

2022-04-12 22:57:43 UTC

(In reply to Krautmaster from comment #20)
Did this most recent crash happen with the patch from comment 5 applied?

(In reply to Krautmaster from comment #19)
You could simply run "dumpon /dev/<dump device name>" to configure it once.  Use "dumpon -l" to verify that it reports the right device.  Also you shouldn't configure a pool on the disk, just pass the raw disk device to dumpon.  If you're willing to give me remote access, please mail me.

Finally, I think your kernel does not have debugging assertions enabled.  If it is possible to get a stock kernel build with "options INVARIANTS" enabled, please try testing it.

The fault address should correspond to the buffer returned by aesni_cipher_alloc().  I'd guess that it's returned by crypto_contiguous_subsegment() and there's some kind of overflow condition occurring with the page array offset or length, but I can't see where.  There is some fishy code, e.g., in g_disk_advance():

		bp->bio_ma_offset += off;
		bp->bio_ma_offset %= PAGE_SIZE;

but this is only a problem for large (> 2GB) offsets, which shouldn't happen...

Comment 22 Krautmaster 2022-04-13 19:15:28 UTC

(In reply to Mark Johnston from comment #21)
 
the crash was with the stock kernel as I was trying to check the dump stuff which makes most sense with the standard kernel I think. As far as I know I had no GELI crashes with the patched kernel, but I have other issues and panics which might be ZFS or kernel related as well so I went back to Freebsd 12 (Truenas 12.0 U7) which ran fine for monthes without a single crash.

I will check the remote access and update to the latest nightly for further testing. Alexander might provide me that suggested "debug" Kernel but he is off this week. 

Gimme some time to go back on 13 again. Luckily its pretty easy to swap the boot environments in truenas

Comment 23 Alan Somers freebsd_committer

2022-04-18 17:46:04 UTC

Are you using ZFS atop geli?  If so, it shouldn't be possible to have unmapped I/O in geli.  To check you can do something like the following:

sudo dtrace -i 'fbt:geom_eli:g_eli_crypto_run:entry /args[1]->bio_flags & BIO_UNMAPPED/ {stack();}'

if it prints anything at all, that means you're using unmapped I/O with geli.

Comment 24 Mark Johnston freebsd_committer

2022-04-18 18:34:09 UTC

(In reply to Alan Somers from comment #23)
Why wouldn't it be possible?  FWIW, on my laptop using ZFS/GELI I see a mix of mapped and unmapped I/O, both reads and writes.

Comment 25 Krautmaster 2022-04-18 18:36:41 UTC

(In reply to Alan Somers from comment #23)
my truenas system reports on that


root@freenas[~]# dtrace -i 'fbt:geom_eli:g_eli_crypto_run:entry /args[1]->bio_flags & BIO_UNMAPPED/ {stack();}'
dtrace: invalid probe specifier fbt:geom_eli:g_eli_crypto_run:entry /args[1]->bio_flags & BIO_UNMAPPED/ {stack();}: probe description fbt:geom_eli:g_eli_crypto_run:entry does not match any probes
root@freenas[~]#

Comment 26 Alexander Motin freebsd_committer

2022-04-18 18:41:08 UTC

(In reply to Mark Johnston from comment #24)
ZFS on FreeBSD allocates all I/O memory through UMA, so it is always mapped.  But just recently I made it use BIO_UNMAPPED for poor-man's scatter/gather on page boundaries to avoid one memory copy on I/O aggregation.  So there should indeed be a mix of virtual and physical (represented as unmapped) addressed I/Os visible.  I've even forgot about it myself when looking on this, thinking the unmapped I/O is generated by swapper. ;)  But indeed it can be ZFS now.

Comment 27 Alexander Motin freebsd_committer

2022-04-18 19:39:25 UTC

I've tested small instruction to get minidump on TrueNAS:

Before the crash:

sysctl debug.debugger_on_panic=0
sysctl debug.ddb.textdump.pending=0
dumpon off
dumpon /dev/daX

After the crash:

cd /mnt/tank
savecore . /dev/da2

As result, in the specified directory should be stored couple files representing the dump.  They do for me.

Debug symbols for the specific TrueNAS build can be found at http://download.freenas.org/ , looking for TrueNAS-13.0-MASTER-*.debug.txz for the exact version running (see `cat /etc/version` on the NAS).  Inside archive in usr/lib/debug/boot there are symbols for normal and debug kernels (depending which one is enabled), that can be unpacked into the same path on TrueNAS root to run kgdb on the core.

Comment 28 Mark Johnston freebsd_committer

2022-04-18 20:57:16 UTC

(In reply to Alexander Motin from comment #26)
BTW, the loop in vdev_geom_fill_unmap_cb() looks wrong to me when (addr & PAGE_MASK) != 0.  Suppose addr & PAGE_MASK is 2048 and len is 4096.  Then we want two pages in the array, but it looks like the loop will exit after the first iteration.  I think we need to set addr &= ~PAGE_MASK before the loop, or I am missing something.

I'm not sure when non-page-aligned ABD buffers can arise in practice though.

Comment 29 commit-hook freebsd_committer

2022-04-18 21:56:52 UTC

A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=081b4452a758dd81dcdc68ffb6f7bad901d53e3d

commit 081b4452a758dd81dcdc68ffb6f7bad901d53e3d
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2022-04-18 21:16:10 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2022-04-18 21:55:24 +0000

    geli: Add a chicken switch for unmapped I/O

    We have a report of a panic in GELI that appears to go away when
    unmapped I/O is disabled.  Add a tunable to make such investigations
    easier in the future.  No functional change intended.

    PR:             262894
    Reviewed by:    asomers
    MFC after:      1 week
    Sponsored by:   The FreeBSD Foundation
    Differential Revision:  https://reviews.freebsd.org/D34944

 lib/geom/eli/geli.8  |  8 +++++++-
 sys/geom/eli/g_eli.c | 12 ++++++++----
 2 files changed, 15 insertions(+), 5 deletions(-)

Comment 30 Alexander Motin freebsd_committer

2022-04-18 21:58:18 UTC

(In reply to Mark Johnston from comment #28)
While it seems like a good catch on a first look, I doubt it is exploitable.  The code uses unmapped I/O only if all boundaries within the ABD except the first and the last are page aligned.  The case of "addr & PAGE_MASK is 2048 and len is 4096" can fit into this only if it is the only chunk in ABD, but then it should be a linear buffer, not requiring unmapped I/O.  Fitting case of addr & PAGE_MASK is 2048 and len is 6144 should work fine, producing two pages.

Plus TrueNAS for many years uses ashift=12, which means all offsets in RAIDZ and gang blocks should be multiple of 4K and so page-aligned on x86.

But still, just in case, what would you say about this patch:

diff --git a/module/os/freebsd/zfs/vdev_geom.c b/module/os/freebsd/zfs/vdev_geom.c
index 2ef4811a8..5447eb922 100644
--- a/module/os/freebsd/zfs/vdev_geom.c
+++ b/module/os/freebsd/zfs/vdev_geom.c
@@ -1132,8 +1132,12 @@ vdev_geom_fill_unmap_cb(void *buf, size_t len, void *priv)
        vm_offset_t addr = (vm_offset_t)buf;
        vm_offset_t end = addr + len;
 
-       if (bp->bio_ma_n == 0)
+       if (bp->bio_ma_n == 0) {
                bp->bio_ma_offset = addr & PAGE_MASK;
+               addr &= ~PAGE_MASK;
+       } else {
+               ASSERT0(P2PHASE(addr, PAGE_SIZE));
+       }
        do {
                bp->bio_ma[bp->bio_ma_n++] =
                    PHYS_TO_VM_PAGE(pmap_kextract(addr));

Comment 31 Mark Johnston freebsd_committer

2022-04-18 23:07:47 UTC

(In reply to Alexander Motin from comment #30)
> The case of "addr & PAGE_MASK is 2048 and len is 4096" can fit into this only if it is the only chunk in ABD, but then it should be a linear buffer, not requiring unmapped I/O.

The problem exists whenever len is a multiple of the page size, so I don't see why it should always be a linear buffer.  But indeed, I'd expect I/O to a device with ashift=12 to always be page aligned.

> But still, just in case, what would you say about this patch:

Looks right to me, thanks.

Comment 32 Alexander Motin freebsd_committer

2022-04-19 00:45:53 UTC

OpenZFS PR: https://github.com/openzfs/zfs/pull/13345

Comment 33 Alexander Motin freebsd_committer

2022-04-20 18:47:53 UTC

We've managed to get full dump:

Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address   = 0xfffff80e00000004
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80f1cdcd
stack pointer           = 0x28:0xfffffe0148413c00
frame pointer           = 0x28:0xfffffe0148413ca0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 3655 (g_eli[3] gptid/e420)
trap number             = 12
panic: page fault
cpuid = 3
time = 1650468037
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01484139c0
vpanic() at vpanic+0x17f/frame 0xfffffe0148413a10
panic() at panic+0x43/frame 0xfffffe0148413a70
trap_fatal() at trap_fatal+0x385/frame 0xfffffe0148413ad0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0148413b30
calltrap() at calltrap+0x8/frame 0xfffffe0148413b30
--- trap 0xc, rip = 0xffffffff80f1cdcd, rsp = 0xfffffe0148413c00, rbp = 0xfffffe0148413ca0 ---
aesni_crypt_xts() at aesni_crypt_xts+0x17d/frame 0xfffffe0148413ca0
aesni_decrypt_xts() at aesni_decrypt_xts+0xe/frame 0xfffffe0148413cc0
aesni_cipher_crypt() at aesni_cipher_crypt+0x2f1/frame 0xfffffe0148413d70
aesni_process() at aesni_process+0x159/frame 0xfffffe0148413dc0
crypto_dispatch() at crypto_dispatch+0x118/frame 0xfffffe0148413df0
g_eli_crypto_run() at g_eli_crypto_run+0x178/frame 0xfffffe0148413e90
g_eli_worker() at g_eli_worker+0x328/frame 0xfffffe0148413ef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe0148413f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0148413f30
--- trap 0x80af60b4, rip = 0, rsp = 0, rbp = 0 ---

(kgdb) bt
#0  __curthread () at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=textdump@entry=1) at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/kern/kern_shutdown.c:399
#2  0xffffffff80b164c1 in kern_reboot (howto=260) at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/kern/kern_shutdown.c:487
#3  0xffffffff80b1693e in vpanic (fmt=0xffffffff811b9599 "%s", ap=<optimized out>)
    at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/kern/kern_shutdown.c:920
#4  0xffffffff80b16743 in panic (fmt=<unavailable>)
    at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/kern/kern_shutdown.c:844
#5  0xffffffff81042855 in trap_fatal (frame=0xfffffe0148413b40, eva=18446735337746071556)
    at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/amd64/amd64/trap.c:944
#6  0xffffffff810428af in trap_pfault (frame=0xfffffe0148413b40, usermode=false, signo=<optimized out>, ucode=<optimized out>)
    at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/amd64/amd64/trap.c:763
#7  <signal handler called>
#8  aesni_crypt_xts_block8 (key_schedule=<optimized out>, from=<optimized out>, to=<optimized out>, rounds=<optimized out>, 
    tweak=<optimized out>, do_encrypt=<optimized out>)
    at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/crypto/aesni/aesni_wrap.c:358
#9  aesni_crypt_xts (rounds=<optimized out>, data_schedule=0xfffff8000ed94140, tweak_schedule=<optimized out>, len=<optimized out>, 
    from=<optimized out>, from@entry=0xfffff80e00000004 <error: Cannot access memory at address 0xfffff80e00000004>, 
    to=<optimized out>, to@entry=0xfffff80e00000004 <error: Cannot access memory at address 0xfffff80e00000004>, 
    iv=0xfffffe0148413d30 "", do_encrypt=0) at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/crypto/aesni/aesni_wrap.c:411
#10 0xffffffff80f1d3ee in aesni_decrypt_xts (rounds=12, data_schedule=0xfffff8000ed94140, tweak_schedule=0xfffff8000ed94160, len=32, 
    from=from@entry=0xfffff80e00000004 <error: Cannot access memory at address 0xfffff80e00000004>, 
    to=to@entry=0xfffff80e00000004 <error: Cannot access memory at address 0xfffff80e00000004>, iv=0xfffffe0148413d30 "")
    at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/crypto/aesni/aesni_wrap.c:442
#11 0xffffffff80f16ce1 in aesni_cipher_crypt (ses=0xfffff8000ed94048, crp=crp@entry=0xfffff801e7e1fe38, csp=<optimized out>, 
    csp@entry=0xfffff8000ed94008) at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/crypto/aesni/aesni.c:788
#12 0xffffffff80f166e9 in aesni_cipher_process (ses=<optimized out>, crp=0xfffff801e7e1fe38)
    at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/crypto/aesni/aesni.c:687
#13 aesni_process (dev=<optimized out>, crp=0xfffff801e7e1fe38, hint=<optimized out>)
    at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/crypto/aesni/aesni.c:379
#14 0xffffffff80e3b078 in crypto_dispatch (crp=crp@entry=0xfffff801e7e1fe38)
    at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/opencrypto/crypto.c:1498
#15 0xffffffff80a3f678 in g_eli_crypto_run (wr=wr@entry=0xfffff803e1bb4440, bp=bp@entry=0xfffff806c34978d0)
    at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/geom/eli/g_eli_privacy.c:343
#16 0xffffffff80a38378 in g_eli_worker (arg=arg@entry=0xfffff803e1bb4440)
    at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/geom/eli/g_eli.c:708
#17 0xffffffff80ad223e in fork_exit (callout=0xffffffff80a38050 <g_eli_worker>, arg=0xfffff803e1bb4440, frame=0xfffffe0148413f40)
    at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/kern/kern_fork.c:1093
#18 <signal handler called>

I haven't touched crypto code for a while, so going to look into what's interesting there slowly.

Comment 34 Mark Johnston freebsd_committer

2022-04-20 19:11:11 UTC

(In reply to Alexander Motin from comment #33)
> I haven't touched crypto code for a while, so going to look into what's interesting there slowly.

It'd be interesting to see crp->crp_buf and *bp from the g_eli_crypto_run() frame.

Comment 35 Alexander Motin freebsd_committer

2022-04-20 19:14:15 UTC

(In reply to Mark Johnston from comment #34)
(kgdb) p crp->crp_buf
$168 = {{{cb_buf = 0xfffff807057af838 "0`\377\016\001\376\377\377pK\260\v", cb_buf_len = 4096}, cb_mbuf = 0xfffff807057af838, {
      cb_vm_page = 0xfffff807057af838, cb_vm_page_len = 4096, cb_vm_page_offset = 0}, cb_uio = 0xfffff807057af838}, 
  cb_type = CRYPTO_BUF_VMPAGE}
(kgdb) p *bp
$169 = {bio_cmd = 1, bio_flags = 16, bio_cflags = 0, bio_pflags = 3, bio_dev = 0x0, bio_disk = 0x0, bio_offset = 1510289403904, 
  bio_bcount = 0, bio_data = 0xfffffe00ddb2c000 <error: Cannot access memory at address 0xfffffe00ddb2c000>, 
  bio_ma = 0xfffff807057af800, bio_ma_offset = 0, bio_ma_n = 154, bio_error = 0, bio_resid = 0, 
  bio_done = 0xffffffff826c0c50 <vdev_geom_io_intr>, bio_driver1 = 0xfffff8008c8135e0, bio_driver2 = 0x0, 
  bio_caller1 = 0xfffff805a33459a0, bio_caller2 = 0x0, bio_queue = {tqe_next = 0x0, tqe_prev = 0xfffff802190812b8}, 
  bio_attribute = 0x0, bio_zone = {zone_cmd = 0 '\000', zone_params = {disk_params = {zone_mode = 0, flags = 0, optimal_seq_zones = 0, 
        optimal_nonseq_zones = 0, max_seq_zones = 0}, rwp = {id = 0, flags = 0 '\000'}, report = {starting_id = 0, 
        rep_options = 0 '\000', header = {same = 0 '\000', maximum_lba = 0, reserved = '\000' <repeats 63 times>}, 
        entries_allocated = 0, entries_filled = 0, entries_available = 0, entries = 0x0}}}, bio_from = 0xfffff803ce163c00, 
  bio_to = 0xfffff804124ae700, bio_length = 630784, bio_completed = 28672, bio_children = 154, bio_inbed = 7, bio_parent = 0x0, 
  bio_t0 = {sec = 66459, frac = 13157596445544270851}, bio_task = 0x0, bio_task_arg = 0x0, bio_spare1 = 0x0, bio_spare2 = 0x0, 
  bio_pblkno = 0}

Comment 36 Alexander Motin freebsd_committer

2022-04-28 02:01:31 UTC

Closer look on panics with Mark shown that GELI is an innocent victim of what may be a memory corruption.  Testing the system with debug kernel triggered number of identical panics in CAM, that should be fixed by https://cgit.freebsd.org/src/commit/?id=404f001161b975164d8b52d9f404d07ac7584027 .  With some stretch of imagination it could be the cause of memory corruptions.  So after it fixed we'll need more testing.

Comment 37 Alexander Motin freebsd_committer

2022-04-29 17:12:48 UTC

First tests are going good, so I am closing this bug report as fixed.  Any way information in it was not leading anywhere else.

Comment 38 Krautmaster 2022-05-12 19:13:37 UTC

stable so far, so definitely solved as far as i can tell

Comment 39 Alexander Motin freebsd_committer

2022-05-12 19:29:10 UTC

The commit is merged to 12/13-stable.  I will request 13.1 EN after the release is out.

Comment 40 commit-hook freebsd_committer

2022-07-19 14:10:17 UTC

A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=b2e2412d150c848e256f31f4e87f640bdcc9c016

commit b2e2412d150c848e256f31f4e87f640bdcc9c016
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2022-04-18 21:16:10 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2022-07-19 13:52:00 +0000

    geli: Add a chicken switch for unmapped I/O

    We have a report of a panic in GELI that appears to go away when
    unmapped I/O is disabled.  Add a tunable to make such investigations
    easier in the future.  No functional change intended.

    PR:             262894
    Reviewed by:    asomers
    Sponsored by:   The FreeBSD Foundation

    (cherry picked from commit 081b4452a758dd81dcdc68ffb6f7bad901d53e3d)

 lib/geom/eli/geli.8  | 8 +++++++-
 sys/geom/eli/g_eli.c | 5 ++++-
 2 files changed, 11 insertions(+), 2 deletions(-)