The recent changes to enable DMAR by default on Intel result in a panic on my system (partial stack, I can't get to it easily): null pointer dereference in dmar_match_by_path because unit is NULL. trap dmar_match_by_path() +0x20 dmar_find()+0x185 iommu_get_dma_tag() acpi_pci_get_dma_tag() xhci_init() xhci_pci_attach() ... Adding a workaround to return false when unit == NULL in dmar_match_by_path results in a system w/o all its interrupts, so it doesn't boot. Lots of CPU0:lpcal APIC error 0x40 There may be other errors in the log, but my keyboard is jammed when it breaks to debugger, so I can't scroll back, or ask for dmesg from the debugger. Only 'hw.dmar.enable=0' in loader.conf offers any relief.
This is a 8th generation i7 Lenovo YOGA.
If unit is not set, it is either BIOS bug, or something prevented attach from finishing. In either case, there should be some messages in the (verbose) dmesg giving a hint. Do you have AMT on this machine? It might work as a serial console, to catch boot-time messages.
You might also disable interrupts remapping. Then driver should only attach, without affecting either DMA or intr operations, and the system should boot. Then we can get dmesg and see why attach (?) failed.
Created attachment 252773 [details] dmesg log with boot from 15/08 and 30/07 for comparison Laptop amd64: Lenovo Legion 5 Intel (Legion 5-15IMH05 (Lenovo) - Type 82AU) Upgrading from around 30/07 -> 15/08 boot is OK and I've noticed: --- nda0: Serial Number S4DZNFDMAR0: <unknown dev>:pci0:30:7 sid f7 fault acc 0 adt 0x0 reason 0x25 addr 100000000000000 DMAR0: <unknown dev>:pci0:30:7 sid f7 fault acc 0 adt 0x0 reason 0x25 addr 100000000000000 DMAR0: <unknown dev>:pci0:30:7 sid f7 fault acc 0 adt 0x0 reason 0x25 addr 400000000000000 DMAR0: <unknown dev>:pci0:30:7 sid f7 fault acc 0 adt 0x0 reason 0x25 addr 500000000000000 ---
(In reply to Nuno Teixeira from comment #4) (...) Also shows: dmar0: <DMA remap> iomem 0xfed91000-0xfed91fff on acpi0
Created attachment 252774 [details] dmesg log with boot from 15/08 and 30/07 for comparison (cleaned) Clean duplicate boots and have a log with just 2 boots for comparison
(In reply to Nuno Teixeira from comment #6) I do not understand what do you want to say there. Is your machine bootable after the update, or not? Regardless, what is the device at pci0:30:7? Use pciconf -lv to identify it. The culprit is that the device is issuing compat mode MSI(-X) interrupts message, which are aborted by DMAR and reported as fault. This is expected and really is the DMAR purpose. The only question is why the device does that.
(In reply to Konstantin Belousov from comment #7) My laptop boots fine as I mentioned earlier: > Upgrading from around 30/07 -> 15/08 boot is OK ... Maybe this dmesg logs are usefull.
(In reply to Konstantin Belousov from comment #7) > Regardless, what is the device at pci0:30:7? I'm not at the laptop right now, but I'm using passtrhu on Intel wireless to use in windows 11 bhive. Later I will check pciconf and do a test running win11/bhyve
(In reply to Nuno Teixeira from comment #9) You cannot combine DMAR and bhyve pass-through right now.
(In reply to Konstantin Belousov from comment #10) Ok, good to know. I will disable it.
Created attachment 252790 [details] dmesg main-n271681-82cb2a4158fa dmesg main-n271681-82cb2a4158fa <snip> nda0 at nvme0 bus 0 scbus1 target 0 lun 1 nda0: <SAMSUNG MZVLB1T0HBLR-000L2 3L1QEXF7 S4DZNF0N126179> nda0: Serial Number S4DZNFDMAR0: <unknown dev>:pci0:30:7 sid f7 fault acc 0 adt 0x0 reason 0x25 addr 100000000000000 DMAR0: <unknown dev>:pci0:30:7 sid f7 fault acc 0 adt 0x0 reason 0x25 addr 100000000000000 0N126179 nda0: nvme version 1.3 nda0: 976762MB (2000409264 512 byte sectors) <snip>
Created attachment 252791 [details] pciconf -lv main-n271681-82cb2a4158fa pciconf -lv main-n271681-82cb2a4158fa
(In reply to Konstantin Belousov from comment #10) Hello, Just removed passthru from loader.conf. Maybe this warning/error could be important as descibed in #c12 I'be uploaded pciconf and there is no pci0:30:7 in there... Thanks
(In reply to Nuno Teixeira from comment #14) (...) Something was messing with nda serial number: * from 30/07: nda0: <SAMSUNG MZVLB1T0HBLR-000L2 3L1QEXF7 S4DZNF0N126179> nda0: Serial Number S4DZNF0N126179 * from today: nda0: <SAMSUNG MZVLB1T0HBLR-000L2 3L1QEXF7 S4DZNF0N126179> nda0: Serial Number S4DZNFDMAR0: <unknown dev>:pci0:30:7 sid f7 fault acc 0 adt 0x0 reason 0x25 addr 100000000000000
(In reply to Nuno Teixeira from comment #15) (...) It seems that some dmesg log lines were overlaped.
(In reply to Warner Losh from comment #0) What is the source line where the trap occurs?
Try https://reviews.freebsd.org/D46382 This should stop the panic (I hope), but still I need verbose dmesg to understand why the attach failing.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=0875f3cd74b2f305e82bff4e640c89f891ca84f8 commit 0875f3cd74b2f305e82bff4e640c89f891ca84f8 Author: Ed Maste <emaste@FreeBSD.org> AuthorDate: 2024-08-20 15:43:11 +0000 Commit: Ed Maste <emaste@FreeBSD.org> CommitDate: 2024-08-20 15:49:25 +0000 Revert "x86: Enable Intel DMAR by default" A number of people have reported panics with it enabled by default, possibly due to broken ACPI tables, which we do not handle well. D46382 is a potential fix for this issue. Additionally DMAR is currently not compatible with bhyve passthrough (see comment #10 in PR280817), with a draft patch to address that in D25672. Revert to disabling DMAR by default pending the resolution of those two issues. This reverts commit 3192fc30230ae432b80cca783abc2dbea9d3f383. PR: 280817 Sponsored by: The FreeBSD Foundation sys/x86/iommu/intel_drv.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
hw.dmar.enable="0" in loader.conf is the quick and dirty workaround.
Enabling DMAR also breaks suspend-to-S3 on my framework 13. During resume, the screen doesn't turn on and I can hear fans spinning.
(In reply to Mark Johnston from comment #21) > Enabling DMAR also breaks suspend-to-S3 on my framework 13. Oh yeah, from Val Packett: https://reviews.freebsd.org/D22642
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=45543d3424d46f84a5399879e190fc359dcefbd4 commit 45543d3424d46f84a5399879e190fc359dcefbd4 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2024-08-20 14:41:33 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2024-08-21 15:23:07 +0000 DMAR: clear dmar_devs[unit] if attach failed This should stop attempts to use a unit which was not completely initialized, but referenced by ACPI DMAR table during scoped devices operions. PR: 280817 Sponsored by: Advanced Micro Devices (AMD) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D46382 sys/x86/iommu/intel_drv.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=d66c4853b84002c064bc314a0824a8667a0089c6 commit d66c4853b84002c064bc314a0824a8667a0089c6 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2024-08-20 14:41:33 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2024-08-28 00:26:33 +0000 DMAR: clear dmar_devs[unit] if attach failed PR: 280817 (cherry picked from commit 45543d3424d46f84a5399879e190fc359dcefbd4) sys/x86/iommu/intel_drv.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
DMAR has now been disabled by default (i.e. hw.dmar.enable=0). It would be good to confirm after Kostik's 45543d3424d4 the panic does not occur with it reenabled.
Presumed addressed by Kostik's change; Warner's laptop that demonstrated this issue is no longer functional. Others can test with hw.dmar.enable=1 set and submit a PR if additional issues are encountered. I'll see about posting a call for testing to -CURRENT.