Summary: | Kernel Panic (page fault) with 13.1-BETA2 in g_eli & httpd | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Krautmaster <mathias.kraut> | ||||
Component: | kern | Assignee: | Alexander Motin <mav> | ||||
Status: | Closed FIXED | ||||||
Severity: | Affects Only Me | CC: | asomers, grahamperrin, markj, mav | ||||
Priority: | --- | ||||||
Version: | 13.1-RELEASE | ||||||
Hardware: | amd64 | ||||||
OS: | Any | ||||||
Attachments: |
|
Description
Krautmaster
2022-03-28 17:15:46 UTC
- Any chance you can get a full vmcore for this panic? The AES code in question hasn't changed in a long time, so this is more likely a bug in opencrypto or GELI and so will be harder to debug. - Are you able to confirm whether this happens in 13.0? - Are you able to test kernel patches? Based on the fault address I'm wondering a bit if this is related to handling of unmapped I/O in GELI, which was added in 13 and doesn't seem to be present in 12. We don't currently have a chicken switch to disable it, so a patch would be needed to test that theory. hmm not that easy to provide the same. I'm using Truenas 12/13 so if someone builds me a kernel for I can swap them out. A second issue not booting with the LSI was preset in this bug https://jira.ixsystems.com/browse/NAS-115143 which also came with this https://github.com/truenas/os/commit/5bd08f719f778648e4383f5e4d07bd384047a310 this bug could be reproduced with the standard images of freeBSD 13/14 but the one from that page fault is pretty hard. As it occurs quite often I could use a testkernel and turntables of your choice for some days to see if it occurs again, if you got a better idea... let me know I could also see some reports having panics with Network stuff and Jails ins 13.1, maybe is that IO thing involved there as well. Will ask the guys form that other issue if its possible to test the same with a 13.0 kernel. best regards that contact form that HBA bug offered me to provide 13.1 Kernels with specific patches if required. @Mark Johnston I can report that the same machine back with 12.2 runs fine again, any load, no crash so far (like it did before the 13.1 update). Wonder what may have caused the httpd and geli page fault, maybe the same root cause. See the other crash in the row of three in the attachment of this ticket Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x10 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80ac5384 stack pointer = 0x28:0xfffffe015bedbc80 frame pointer = 0x28:0xfffffe015bedbcc0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 37468 (httpd) trap number = 12 panic: page fault cpuid = 0 time = 1648160238 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe015bedba40 vpanic() at vpanic+0x17f/frame 0xfffffe015bedba90 panic() at panic+0x43/frame 0xfffffe015bedbaf0 trap_fatal() at trap_fatal+0x385/frame 0xfffffe015bedbb50 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe015bedbbb0 calltrap() at calltrap+0x8/frame 0xfffffe015bedbbb0 --- trap 0xc, rip = 0xffffffff80ac5384, rsp = 0xfffffe015bedbc80, rbp = 0xfffffe015bedbcc0 --- kqueue_drain() at kqueue_drain+0x134/frame 0xfffffe015bedbcc0 kqueue_close() at kqueue_close+0x42/frame 0xfffffe015bedbd10 _fdrop() at _fdrop+0x11/frame 0xfffffe015bedbd30 closef() at closef+0x24b/frame 0xfffffe015bedbdc0 closefp() at closefp+0x80/frame 0xfffffe015bedbe00 amd64_syscall() at amd64_syscall+0x10c/frame 0xfffffe015bedbf30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe015bedbf30 --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x8007f591a, rsp = 0x7fffffffe808, rbp = 0x7fffffffe820 --- KDB: enter: panic (In reply to Krautmaster from comment #4) Let's focus on the GELI panic first. The fault address is 0xfffff80e00000004, which is in the direct map range, but with 32GB of RAM the direct map should end roughly at 0xfffff80800000000. Maybe there is some large discontiguity. Some questions: - Which HBA driver is used? - Can you please attach output from "sysctl vm.pmap.kernel_maps"? - Do any of your GELI volumes use authentication? "geli list | grep AuthenticationAlgorithm" will print some output if so. If not, please try the patch below. diff --git a/sys/geom/eli/g_eli.c b/sys/geom/eli/g_eli.c index 4978523cbebe..9698cb2fdbc8 100644 --- a/sys/geom/eli/g_eli.c +++ b/sys/geom/eli/g_eli.c @@ -1137,6 +1137,7 @@ g_eli_create(struct gctl_req *req, struct g_class *mp, struct g_provider *bpp, */ pp = g_new_providerf(gp, "%s%s", bpp->name, G_ELI_SUFFIX); pp->flags |= G_PF_DIRECT_SEND | G_PF_DIRECT_RECEIVE; +#if 0 if (CRYPTO_HAS_VMPAGE) { /* * On DMAP architectures we can use unmapped I/O. But don't @@ -1146,6 +1147,7 @@ g_eli_create(struct gctl_req *req, struct g_class *mp, struct g_provider *bpp, if ((sc->sc_flags & G_ELI_FLAG_AUTH) == 0) pp->flags |= G_PF_ACCEPT_UNMAPPED; } +#endif pp->mediasize = sc->sc_mediasize; pp->sectorsize = sc->sc_sectorsize; LIST_FOREACH(gap, &bpp->aliases, ga_next) (In reply to Mark Johnston from comment #5) thanks, i'm currently on freebsd 12 as it runs, but these outputs are from 13 lspci -vvv https://pastebin.com/cUNj8xBe devinfo -vr https://pastebin.com/czp6eKhd dmesg https://pastebin.com/vDdhw4M4 sysctl vm.pmap.kernel_maps https://controlc.com/269784b5 large output geli list | grep AuthenticationAlgorithm -> nothing Im currently waiting for the new daily build as a patch required for a working HBA was commited yesterday https://cgit.freebsd.org/src/commit/?id=5473dee7300507de64c2e6c140b87c9bde8e4462 as soon as it is there ill update to Truenas 13 again and maybe Alex will able to privide me that patched version or kernel or whatever is needed to test your patch. Guess i cant patch it by my own (In reply to Krautmaster from comment #6) The beginning of the sysctl output is truncated. (In reply to Mark Johnston from comment #7) sry here we go https://controlc.com/a2c12826 (In reply to Krautmaster from comment #8) So the direct map indeed ends at 0xfffff80800000000 and the fault address is rather strange. Is it always the same? That is, do all aesni panics contain the line: fault virtual address = 0xfffff80e00000004 ? (In reply to Mark Johnston from comment #9) both geli crashes do, the third one does not contain an address. The crashes only appear in FreeBSD 13, the output of above is from FreeBSD 12 as said but I guess its the same third I mean the httpd crash. I may see if it stays on fault virtual address = 0xfffff80e00000004 once the new 13 nightly is out then ill wait for more crashes. But it might be already strange that both geli have the same faulty address this afternoon i booted Freebsd by fault after a short power issue and started a larger copy again (geli drive to normal unencrypted one). System was running without your provided patch and crashed again. This time different again (like httpd and 2x geli) Anyhow this makes me think that the quite common panics in 13 might have a same root cause? See the panic below. Additionally, I was able to get a patched kernel up and running now and will see if this is more stable in freeBSD 13.1 panic: Solaris(panic): Samsung8TB: blkptr at 0xfffff803dacbb1a8 has invalid CHECKSUM 0 cpuid = 1 time = 1648837251 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe016aa86830 vpanic() at vpanic+0x17f/frame 0xfffffe016aa86880 panic() at panic+0x43/frame 0xfffffe016aa868e0 vcmn_err() at vcmn_err+0xeb/frame 0xfffffe016aa86a10 zfs_panic_recover() at zfs_panic_recover+0x59/frame 0xfffffe016aa86a70 zfs_blkptr_verify_log() at zfs_blkptr_verify_log+0xa3/frame 0xfffffe016aa86c00 zfs_blkptr_verify() at zfs_blkptr_verify+0xaa/frame 0xfffffe016aa86c60 zio_free() at zio_free+0x26/frame 0xfffffe016aa86ca0 dsl_dataset_block_kill() at dsl_dataset_block_kill+0x18d/frame 0xfffffe016aa86d10 dbuf_write_done() at dbuf_write_done+0x4f/frame 0xfffffe016aa86d50 arc_write_done() at arc_write_done+0x314/frame 0xfffffe016aa86d90 zio_done() at zio_done+0x82a/frame 0xfffffe016aa86e00 zio_execute() at zio_execute+0x9f/frame 0xfffffe016aa86e40 taskqueue_run_locked() at taskqueue_run_locked+0x181/frame 0xfffffe016aa86ec0 taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe016aa86ef0 fork_exit() at fork_exit+0x7e/frame 0xfffffe016aa86f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe016aa86f30 --- trap 0x80af6074, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic while it crashed before within minutes after some load on the encrypted ZFS volume, it copies now since ~4h at high IO without an issue. Maybe too early for a final statement, but that "unmapped I/O in GELI" could be the case so your feeling might be right. Will keep the machine online for as long as needed and see if it fails the next days. Also checked the data/crash folder on FreeBSD 12, not a single entry and four within barely a week in FreeBSD 13.1. Thanks for the fast support. Will keep updating on that patched Kernel okay first of all, i seem not to get any geli panics any more and the system was stable so far - till this night during scrub. There seems to be an other issue, related to ZFS or memory or no idea if this has the same root cause panic: Solaris(panic): Samsung8TB: blkptr at 0xfffff803dacbb1a8 has invalid CHECKSUM 0 cpuid = 1 time = 1648837251 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe016aa86830 vpanic() at vpanic+0x17f/frame 0xfffffe016aa86880 panic() at panic+0x43/frame 0xfffffe016aa868e0 vcmn_err() at vcmn_err+0xeb/frame 0xfffffe016aa86a10 zfs_panic_recover() at zfs_panic_recover+0x59/frame 0xfffffe016aa86a70 zfs_blkptr_verify_log() at zfs_blkptr_verify_log+0xa3/frame 0xfffffe016aa86c00 zfs_blkptr_verify() at zfs_blkptr_verify+0xaa/frame 0xfffffe016aa86c60 zio_free() at zio_free+0x26/frame 0xfffffe016aa86ca0 dsl_dataset_block_kill() at dsl_dataset_block_kill+0x18d/frame 0xfffffe016aa86d10 dbuf_write_done() at dbuf_write_done+0x4f/frame 0xfffffe016aa86d50 arc_write_done() at arc_write_done+0x314/frame 0xfffffe016aa86d90 zio_done() at zio_done+0x82a/frame 0xfffffe016aa86e00 zio_execute() at zio_execute+0x9f/frame 0xfffffe016aa86e40 taskqueue_run_locked() at taskqueue_run_locked+0x181/frame 0xfffffe016aa86ec0 taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe016aa86ef0 fork_exit() at fork_exit+0x7e/frame 0xfffffe016aa86f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe016aa86f30 --- trap 0x80af6074, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic ah sorry the posted panic was the old one, thats the correct one panic: bad pte va fffff802e3374030 pte 0 cpuid = 7 time = 1649114378 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01656c2690 vpanic() at vpanic+0x17f/frame 0xfffffe01656c26e0 panic() at panic+0x43/frame 0xfffffe01656c2740 pmap_remove_pages() at pmap_remove_pages+0x92f/frame 0xfffffe01656c2890 exec_new_vmspace() at exec_new_vmspace+0x223/frame 0xfffffe01656c28f0 exec_elf64_imgact() at exec_elf64_imgact+0xb16/frame 0xfffffe01656c29f0 kern_execve() at kern_execve+0x77d/frame 0xfffffe01656c2d70 sys_execve() at sys_execve+0x5a/frame 0xfffffe01656c2e00 amd64_syscall() at amd64_syscall+0x10c/frame 0xfffffe01656c2f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01656c2f30 --- syscall (59, FreeBSD ELF64, sys_execve), rip = 0x8007a3faa, rsp = 0x7fffdfbfb738, rbp = 0x7fffdfbfb740 --- KDB: enter: panic by fault booted the wrong Kernel (nightly update) yesterday again, and just now O got the same geli issue with the same faulty address. Fatal trap 12: page fault while in kernel mode cpuid = 4; apic id = 04 fault virtual address = 0xfffff80e00000004 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80f1cd7d stack pointer = 0x28:0xfffffe014e59ec00 frame pointer = 0x28:0xfffffe014e59eca0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 2705 (g_eli[4] gptid/e567) trap number = 12 panic: page fault cpuid = 4 time = 1649430252 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe014e59e9c0 vpanic() at vpanic+0x17f/frame 0xfffffe014e59ea10 panic() at panic+0x43/frame 0xfffffe014e59ea70 trap_fatal() at trap_fatal+0x385/frame 0xfffffe014e59ead0 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe014e59eb30 calltrap() at calltrap+0x8/frame 0xfffffe014e59eb30 --- trap 0xc, rip = 0xffffffff80f1cd7d, rsp = 0xfffffe014e59ec00, rbp = 0xfffffe014e59eca0 --- aesni_crypt_xts() at aesni_crypt_xts+0x17d/frame 0xfffffe014e59eca0 aesni_decrypt_xts() at aesni_decrypt_xts+0xe/frame 0xfffffe014e59ecc0 aesni_cipher_crypt() at aesni_cipher_crypt+0x2f1/frame 0xfffffe014e59ed70 aesni_process() at aesni_process+0x159/frame 0xfffffe014e59edc0 crypto_dispatch() at crypto_dispatch+0x118/frame 0xfffffe014e59edf0 g_eli_crypto_run() at g_eli_crypto_run+0x178/frame 0xfffffe014e59ee90 g_eli_worker() at g_eli_worker+0x328/frame 0xfffffe014e59eef0 fork_exit() at fork_exit+0x7e/frame 0xfffffe014e59ef30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe014e59ef30 --- trap 0x80af60b4, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic (In reply to Krautmaster from comment #16) Ok, so this suggests that the GELI panics at least are triggered by the unmapped I/O support. It'll be hard to debug further without a crash dump; is there any reason you can't set one up (e.g., by adding a new virtual disk and setting it as the dump device ("dumpdev" in /etc/rc.conf))? I can do so and report. sadly I was not able to configure the full dump so far. Attached a new 50GB disk to the machine, created a single disk pool on it, tried to edit the rc.conf but it seem to get overwritten on each start. I also tried to set dumdev path via tunables in the Truenas GUI, but after my new crash this morning on stock kernel it was empty. Need some help to configure this full dumpdev, maybe i need to mount its in an other FS, beside the Truenas thing and maybe I have to configure the dumpdev anywhere else? Thanks for the help A list of the normal dumps Truenas does can be found here https://1drv.ms/u/s!Ar_eIBtD4lGqicUg1WusuXtlKnjS_g?e=C1gwBk (In reply to Mark Johnston from comment #17) next crash. So far the dumpdev drive seem not to work using "rc" tunables in Truenas. But I can offer direct device access to configure whatever needed Fatal trap 12: page fault while in kernel mode cpuid = 7; apic id = 07 fault virtual address = 0xfffff80b00000004 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80f1cdcd stack pointer = 0x28:0xfffffe014e5fbc00 frame pointer = 0x28:0xfffffe014e5fbca0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 3648 (g_eli[7] gptid/e539) trap number = 12 panic: page fault cpuid = 7 time = 1649591029 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe014e5fb9c0 vpanic() at vpanic+0x17f/frame 0xfffffe014e5fba10 panic() at panic+0x43/frame 0xfffffe014e5fba70 trap_fatal() at trap_fatal+0x385/frame 0xfffffe014e5fbad0 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe014e5fbb30 calltrap() at calltrap+0x8/frame 0xfffffe014e5fbb30 --- trap 0xc, rip = 0xffffffff80f1cdcd, rsp = 0xfffffe014e5fbc00, rbp = 0xfffffe014e5fbca0 --- aesni_crypt_xts() at aesni_crypt_xts+0x17d/frame 0xfffffe014e5fbca0 aesni_decrypt_xts() at aesni_decrypt_xts+0xe/frame 0xfffffe014e5fbcc0 aesni_cipher_crypt() at aesni_cipher_crypt+0x2f1/frame 0xfffffe014e5fbd70 aesni_process() at aesni_process+0x159/frame 0xfffffe014e5fbdc0 crypto_dispatch() at crypto_dispatch+0x118/frame 0xfffffe014e5fbdf0 g_eli_crypto_run() at g_eli_crypto_run+0x178/frame 0xfffffe014e5fbe90 g_eli_worker() at g_eli_worker+0x328/frame 0xfffffe014e5fbef0 fork_exit() at fork_exit+0x7e/frame 0xfffffe014e5fbf30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe014e5fbf30 --- trap 0x80af60b4, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic (In reply to Krautmaster from comment #20) Did this most recent crash happen with the patch from comment 5 applied? (In reply to Krautmaster from comment #19) You could simply run "dumpon /dev/<dump device name>" to configure it once. Use "dumpon -l" to verify that it reports the right device. Also you shouldn't configure a pool on the disk, just pass the raw disk device to dumpon. If you're willing to give me remote access, please mail me. Finally, I think your kernel does not have debugging assertions enabled. If it is possible to get a stock kernel build with "options INVARIANTS" enabled, please try testing it. The fault address should correspond to the buffer returned by aesni_cipher_alloc(). I'd guess that it's returned by crypto_contiguous_subsegment() and there's some kind of overflow condition occurring with the page array offset or length, but I can't see where. There is some fishy code, e.g., in g_disk_advance(): bp->bio_ma_offset += off; bp->bio_ma_offset %= PAGE_SIZE; but this is only a problem for large (> 2GB) offsets, which shouldn't happen... (In reply to Mark Johnston from comment #21) the crash was with the stock kernel as I was trying to check the dump stuff which makes most sense with the standard kernel I think. As far as I know I had no GELI crashes with the patched kernel, but I have other issues and panics which might be ZFS or kernel related as well so I went back to Freebsd 12 (Truenas 12.0 U7) which ran fine for monthes without a single crash. I will check the remote access and update to the latest nightly for further testing. Alexander might provide me that suggested "debug" Kernel but he is off this week. Gimme some time to go back on 13 again. Luckily its pretty easy to swap the boot environments in truenas Are you using ZFS atop geli? If so, it shouldn't be possible to have unmapped I/O in geli. To check you can do something like the following: sudo dtrace -i 'fbt:geom_eli:g_eli_crypto_run:entry /args[1]->bio_flags & BIO_UNMAPPED/ {stack();}' if it prints anything at all, that means you're using unmapped I/O with geli. (In reply to Alan Somers from comment #23) Why wouldn't it be possible? FWIW, on my laptop using ZFS/GELI I see a mix of mapped and unmapped I/O, both reads and writes. (In reply to Alan Somers from comment #23) my truenas system reports on that root@freenas[~]# dtrace -i 'fbt:geom_eli:g_eli_crypto_run:entry /args[1]->bio_flags & BIO_UNMAPPED/ {stack();}' dtrace: invalid probe specifier fbt:geom_eli:g_eli_crypto_run:entry /args[1]->bio_flags & BIO_UNMAPPED/ {stack();}: probe description fbt:geom_eli:g_eli_crypto_run:entry does not match any probes root@freenas[~]# (In reply to Mark Johnston from comment #24) ZFS on FreeBSD allocates all I/O memory through UMA, so it is always mapped. But just recently I made it use BIO_UNMAPPED for poor-man's scatter/gather on page boundaries to avoid one memory copy on I/O aggregation. So there should indeed be a mix of virtual and physical (represented as unmapped) addressed I/Os visible. I've even forgot about it myself when looking on this, thinking the unmapped I/O is generated by swapper. ;) But indeed it can be ZFS now. I've tested small instruction to get minidump on TrueNAS: Before the crash: sysctl debug.debugger_on_panic=0 sysctl debug.ddb.textdump.pending=0 dumpon off dumpon /dev/daX After the crash: cd /mnt/tank savecore . /dev/da2 As result, in the specified directory should be stored couple files representing the dump. They do for me. Debug symbols for the specific TrueNAS build can be found at http://download.freenas.org/ , looking for TrueNAS-13.0-MASTER-*.debug.txz for the exact version running (see `cat /etc/version` on the NAS). Inside archive in usr/lib/debug/boot there are symbols for normal and debug kernels (depending which one is enabled), that can be unpacked into the same path on TrueNAS root to run kgdb on the core. (In reply to Alexander Motin from comment #26) BTW, the loop in vdev_geom_fill_unmap_cb() looks wrong to me when (addr & PAGE_MASK) != 0. Suppose addr & PAGE_MASK is 2048 and len is 4096. Then we want two pages in the array, but it looks like the loop will exit after the first iteration. I think we need to set addr &= ~PAGE_MASK before the loop, or I am missing something. I'm not sure when non-page-aligned ABD buffers can arise in practice though. A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=081b4452a758dd81dcdc68ffb6f7bad901d53e3d commit 081b4452a758dd81dcdc68ffb6f7bad901d53e3d Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2022-04-18 21:16:10 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2022-04-18 21:55:24 +0000 geli: Add a chicken switch for unmapped I/O We have a report of a panic in GELI that appears to go away when unmapped I/O is disabled. Add a tunable to make such investigations easier in the future. No functional change intended. PR: 262894 Reviewed by: asomers MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34944 lib/geom/eli/geli.8 | 8 +++++++- sys/geom/eli/g_eli.c | 12 ++++++++---- 2 files changed, 15 insertions(+), 5 deletions(-) (In reply to Mark Johnston from comment #28) While it seems like a good catch on a first look, I doubt it is exploitable. The code uses unmapped I/O only if all boundaries within the ABD except the first and the last are page aligned. The case of "addr & PAGE_MASK is 2048 and len is 4096" can fit into this only if it is the only chunk in ABD, but then it should be a linear buffer, not requiring unmapped I/O. Fitting case of addr & PAGE_MASK is 2048 and len is 6144 should work fine, producing two pages. Plus TrueNAS for many years uses ashift=12, which means all offsets in RAIDZ and gang blocks should be multiple of 4K and so page-aligned on x86. But still, just in case, what would you say about this patch: diff --git a/module/os/freebsd/zfs/vdev_geom.c b/module/os/freebsd/zfs/vdev_geom.c index 2ef4811a8..5447eb922 100644 --- a/module/os/freebsd/zfs/vdev_geom.c +++ b/module/os/freebsd/zfs/vdev_geom.c @@ -1132,8 +1132,12 @@ vdev_geom_fill_unmap_cb(void *buf, size_t len, void *priv) vm_offset_t addr = (vm_offset_t)buf; vm_offset_t end = addr + len; - if (bp->bio_ma_n == 0) + if (bp->bio_ma_n == 0) { bp->bio_ma_offset = addr & PAGE_MASK; + addr &= ~PAGE_MASK; + } else { + ASSERT0(P2PHASE(addr, PAGE_SIZE)); + } do { bp->bio_ma[bp->bio_ma_n++] = PHYS_TO_VM_PAGE(pmap_kextract(addr)); (In reply to Alexander Motin from comment #30) > The case of "addr & PAGE_MASK is 2048 and len is 4096" can fit into this only if it is the only chunk in ABD, but then it should be a linear buffer, not requiring unmapped I/O. The problem exists whenever len is a multiple of the page size, so I don't see why it should always be a linear buffer. But indeed, I'd expect I/O to a device with ashift=12 to always be page aligned. > But still, just in case, what would you say about this patch: Looks right to me, thanks. OpenZFS PR: https://github.com/openzfs/zfs/pull/13345 We've managed to get full dump: Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0xfffff80e00000004 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80f1cdcd stack pointer = 0x28:0xfffffe0148413c00 frame pointer = 0x28:0xfffffe0148413ca0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 3655 (g_eli[3] gptid/e420) trap number = 12 panic: page fault cpuid = 3 time = 1650468037 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01484139c0 vpanic() at vpanic+0x17f/frame 0xfffffe0148413a10 panic() at panic+0x43/frame 0xfffffe0148413a70 trap_fatal() at trap_fatal+0x385/frame 0xfffffe0148413ad0 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0148413b30 calltrap() at calltrap+0x8/frame 0xfffffe0148413b30 --- trap 0xc, rip = 0xffffffff80f1cdcd, rsp = 0xfffffe0148413c00, rbp = 0xfffffe0148413ca0 --- aesni_crypt_xts() at aesni_crypt_xts+0x17d/frame 0xfffffe0148413ca0 aesni_decrypt_xts() at aesni_decrypt_xts+0xe/frame 0xfffffe0148413cc0 aesni_cipher_crypt() at aesni_cipher_crypt+0x2f1/frame 0xfffffe0148413d70 aesni_process() at aesni_process+0x159/frame 0xfffffe0148413dc0 crypto_dispatch() at crypto_dispatch+0x118/frame 0xfffffe0148413df0 g_eli_crypto_run() at g_eli_crypto_run+0x178/frame 0xfffffe0148413e90 g_eli_worker() at g_eli_worker+0x328/frame 0xfffffe0148413ef0 fork_exit() at fork_exit+0x7e/frame 0xfffffe0148413f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0148413f30 --- trap 0x80af60b4, rip = 0, rsp = 0, rbp = 0 --- (kgdb) bt #0 __curthread () at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=textdump@entry=1) at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/kern/kern_shutdown.c:399 #2 0xffffffff80b164c1 in kern_reboot (howto=260) at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/kern/kern_shutdown.c:487 #3 0xffffffff80b1693e in vpanic (fmt=0xffffffff811b9599 "%s", ap=<optimized out>) at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/kern/kern_shutdown.c:920 #4 0xffffffff80b16743 in panic (fmt=<unavailable>) at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/kern/kern_shutdown.c:844 #5 0xffffffff81042855 in trap_fatal (frame=0xfffffe0148413b40, eva=18446735337746071556) at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/amd64/amd64/trap.c:944 #6 0xffffffff810428af in trap_pfault (frame=0xfffffe0148413b40, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/amd64/amd64/trap.c:763 #7 <signal handler called> #8 aesni_crypt_xts_block8 (key_schedule=<optimized out>, from=<optimized out>, to=<optimized out>, rounds=<optimized out>, tweak=<optimized out>, do_encrypt=<optimized out>) at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/crypto/aesni/aesni_wrap.c:358 #9 aesni_crypt_xts (rounds=<optimized out>, data_schedule=0xfffff8000ed94140, tweak_schedule=<optimized out>, len=<optimized out>, from=<optimized out>, from@entry=0xfffff80e00000004 <error: Cannot access memory at address 0xfffff80e00000004>, to=<optimized out>, to@entry=0xfffff80e00000004 <error: Cannot access memory at address 0xfffff80e00000004>, iv=0xfffffe0148413d30 "", do_encrypt=0) at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/crypto/aesni/aesni_wrap.c:411 #10 0xffffffff80f1d3ee in aesni_decrypt_xts (rounds=12, data_schedule=0xfffff8000ed94140, tweak_schedule=0xfffff8000ed94160, len=32, from=from@entry=0xfffff80e00000004 <error: Cannot access memory at address 0xfffff80e00000004>, to=to@entry=0xfffff80e00000004 <error: Cannot access memory at address 0xfffff80e00000004>, iv=0xfffffe0148413d30 "") at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/crypto/aesni/aesni_wrap.c:442 #11 0xffffffff80f16ce1 in aesni_cipher_crypt (ses=0xfffff8000ed94048, crp=crp@entry=0xfffff801e7e1fe38, csp=<optimized out>, csp@entry=0xfffff8000ed94008) at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/crypto/aesni/aesni.c:788 #12 0xffffffff80f166e9 in aesni_cipher_process (ses=<optimized out>, crp=0xfffff801e7e1fe38) at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/crypto/aesni/aesni.c:687 #13 aesni_process (dev=<optimized out>, crp=0xfffff801e7e1fe38, hint=<optimized out>) at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/crypto/aesni/aesni.c:379 #14 0xffffffff80e3b078 in crypto_dispatch (crp=crp@entry=0xfffff801e7e1fe38) at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/opencrypto/crypto.c:1498 #15 0xffffffff80a3f678 in g_eli_crypto_run (wr=wr@entry=0xfffff803e1bb4440, bp=bp@entry=0xfffff806c34978d0) at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/geom/eli/g_eli_privacy.c:343 #16 0xffffffff80a38378 in g_eli_worker (arg=arg@entry=0xfffff803e1bb4440) at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/geom/eli/g_eli.c:708 #17 0xffffffff80ad223e in fork_exit (callout=0xffffffff80a38050 <g_eli_worker>, arg=0xfffff803e1bb4440, frame=0xfffffe0148413f40) at /data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/sys/kern/kern_fork.c:1093 #18 <signal handler called> I haven't touched crypto code for a while, so going to look into what's interesting there slowly. (In reply to Alexander Motin from comment #33) > I haven't touched crypto code for a while, so going to look into what's interesting there slowly. It'd be interesting to see crp->crp_buf and *bp from the g_eli_crypto_run() frame. (In reply to Mark Johnston from comment #34) (kgdb) p crp->crp_buf $168 = {{{cb_buf = 0xfffff807057af838 "0`\377\016\001\376\377\377pK\260\v", cb_buf_len = 4096}, cb_mbuf = 0xfffff807057af838, { cb_vm_page = 0xfffff807057af838, cb_vm_page_len = 4096, cb_vm_page_offset = 0}, cb_uio = 0xfffff807057af838}, cb_type = CRYPTO_BUF_VMPAGE} (kgdb) p *bp $169 = {bio_cmd = 1, bio_flags = 16, bio_cflags = 0, bio_pflags = 3, bio_dev = 0x0, bio_disk = 0x0, bio_offset = 1510289403904, bio_bcount = 0, bio_data = 0xfffffe00ddb2c000 <error: Cannot access memory at address 0xfffffe00ddb2c000>, bio_ma = 0xfffff807057af800, bio_ma_offset = 0, bio_ma_n = 154, bio_error = 0, bio_resid = 0, bio_done = 0xffffffff826c0c50 <vdev_geom_io_intr>, bio_driver1 = 0xfffff8008c8135e0, bio_driver2 = 0x0, bio_caller1 = 0xfffff805a33459a0, bio_caller2 = 0x0, bio_queue = {tqe_next = 0x0, tqe_prev = 0xfffff802190812b8}, bio_attribute = 0x0, bio_zone = {zone_cmd = 0 '\000', zone_params = {disk_params = {zone_mode = 0, flags = 0, optimal_seq_zones = 0, optimal_nonseq_zones = 0, max_seq_zones = 0}, rwp = {id = 0, flags = 0 '\000'}, report = {starting_id = 0, rep_options = 0 '\000', header = {same = 0 '\000', maximum_lba = 0, reserved = '\000' <repeats 63 times>}, entries_allocated = 0, entries_filled = 0, entries_available = 0, entries = 0x0}}}, bio_from = 0xfffff803ce163c00, bio_to = 0xfffff804124ae700, bio_length = 630784, bio_completed = 28672, bio_children = 154, bio_inbed = 7, bio_parent = 0x0, bio_t0 = {sec = 66459, frac = 13157596445544270851}, bio_task = 0x0, bio_task_arg = 0x0, bio_spare1 = 0x0, bio_spare2 = 0x0, bio_pblkno = 0} Closer look on panics with Mark shown that GELI is an innocent victim of what may be a memory corruption. Testing the system with debug kernel triggered number of identical panics in CAM, that should be fixed by https://cgit.freebsd.org/src/commit/?id=404f001161b975164d8b52d9f404d07ac7584027 . With some stretch of imagination it could be the cause of memory corruptions. So after it fixed we'll need more testing. First tests are going good, so I am closing this bug report as fixed. Any way information in it was not leading anywhere else. stable so far, so definitely solved as far as i can tell The commit is merged to 12/13-stable. I will request 13.1 EN after the release is out. A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=b2e2412d150c848e256f31f4e87f640bdcc9c016 commit b2e2412d150c848e256f31f4e87f640bdcc9c016 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2022-04-18 21:16:10 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2022-07-19 13:52:00 +0000 geli: Add a chicken switch for unmapped I/O We have a report of a panic in GELI that appears to go away when unmapped I/O is disabled. Add a tunable to make such investigations easier in the future. No functional change intended. PR: 262894 Reviewed by: asomers Sponsored by: The FreeBSD Foundation (cherry picked from commit 081b4452a758dd81dcdc68ffb6f7bad901d53e3d) lib/geom/eli/geli.8 | 8 +++++++- sys/geom/eli/g_eli.c | 5 ++++- 2 files changed, 11 insertions(+), 2 deletions(-) |