I saw the following panic during boot on a system running something close to 12.2-RELEASE. It doesn't happen every time. However, I suspect I've hit the same bug a few other times and not known, because the kernel normally reboots immediately since swap is not configured by this point. Fatal trap 12: page fault while in kernel mode cpuid = 26; apic id = 34 fault virtual address = 0xd0 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff8125a009 stack pointer = 0x28:0xfffffe0000b65f20 frame pointer = 0x28:0xfffffe0000b65f50 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 11 (idle: cpu26) trap number = 12 panic: page fault cpuid = 26 time = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0000b65be0 vpanic() at vpanic+0x17b/frame 0xfffffe0000b65c30 panic() at panic+0x43/frame 0xfffffe0000b65c90 trap_fatal() at trap_fatal+0x391/frame 0xfffffe0000b65cf0 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0000b65d40 trap() at trap+0x286/frame 0xfffffe0000b65e50 calltrap() at calltrap+0x8/frame 0xfffffe0000b65e50 --- trap 0xc, rip = 0xffffffff8125a009, rsp = 0xfffffe0000b65f20, rbp = 0xfffffe0000b65f50 --- _mca_init() at _mca_init+0x5d9/frame 0xfffffe0000b65f50 init_secondary_tail() at init_secondary_tail+0xfd/frame 0xfffffe0000b65f80 init_secondary() at init_secondary+0x2d1/frame 0xfffffe0000b65ff0 KDB: enter: panic [ thread pid 11 tid 100029 ] Stopped at kdb_enter+0x37: movq $0,0x12bc1f6(%rip) The bug is caused because only one of my two CPUs reports support for the MCG_CMCI_P bit. On boot, it's random which CPU the kernel queries for support. If it queries the wrong one, then it doesn't allocate memory for the cmd state, but later calls cmci_setup() for the CPU that does support that bit. The following command shows the asymmetry between the CPUs: $ for x in $(jot $(sysctl -n hw.ncpu) 0) ; do sudo cpucontrol -m 0x179 /dev/cpuctl$x; done | uniq -c 16 MSR 0x179: 0x00000000 0x0f000c14 16 MSR 0x179: 0x00000000 0x0f000814
Created attachment 222184 [details] Unconditionally allocate the cmci memory This patch from kib@FreeBSD.org attempts to fix the problem by unconditionally allocating memory for cmc_state, regardless of the MCG_CAP_CMCI_P bit.
I updated the BIOS from version 5.12, aka 2/24/2018 Rev 2.0b, to 5.14, aka 10/30/2020 Rev 3.4. That fixed the problem. Now all CPUs show the MCG_CMCI_P bit disabled. $ for i in `seq 0 31`; do sudo cpucontrol -m 0x179 /dev/cpuctl${i}; done | uniq -c 32 MSR 0x179: 0x00000000 0x0f000814
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=b5770470276268acef21368b3e77a325df883500 commit b5770470276268acef21368b3e77a325df883500 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2021-02-08 19:42:54 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2021-02-08 19:42:54 +0000 mca: Handle inconsistent CMCI capability reporting A BIOS bug may apparently cause the BSP to report that it does not implement CMCI, with some APs reporting that they do. In this scenario, avoid a NULL pointer dereference that occurs in cmci_monitor() because cmc_state was not allocated by the BSP. PR: 253272 Reported by: asomers, mmacy Reviewed by: kib (previous version) MFC after: 1 week sys/x86/x86/mca.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=8eebd9592e3daf80c2c743666614119d6c862186 commit 8eebd9592e3daf80c2c743666614119d6c862186 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2021-02-08 19:42:54 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2021-02-15 19:12:41 +0000 mca: Handle inconsistent CMCI capability reporting A BIOS bug may apparently cause the BSP to report that it does not implement CMCI, with some APs reporting that they do. In this scenario, avoid a NULL pointer dereference that occurs in cmci_monitor() because cmc_state was not allocated by the BSP. PR: 253272 Reported by: asomers, mmacy Reviewed by: kib (previous version) (cherry picked from commit b5770470276268acef21368b3e77a325df883500) sys/x86/x86/mca.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=dadf603f0f7b54c65fa5f16f552ae6da12f8210b commit dadf603f0f7b54c65fa5f16f552ae6da12f8210b Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2021-02-08 19:42:54 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2021-02-15 19:47:04 +0000 mca: Handle inconsistent CMCI capability reporting A BIOS bug may apparently cause the BSP to report that it does not implement CMCI, with some APs reporting that they do. In this scenario, avoid a NULL pointer dereference that occurs in cmci_monitor() because cmc_state was not allocated by the BSP. PR: 253272 Reported by: asomers, mmacy Reviewed by: kib (previous version) (cherry picked from commit b5770470276268acef21368b3e77a325df883500) sys/x86/x86/mca.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
A commit in branch releng/13.0 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=f560a8b1a4edd1b8a9f110ae2edaa7a3307e9034 commit f560a8b1a4edd1b8a9f110ae2edaa7a3307e9034 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2021-02-16 17:07:43 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2021-02-16 17:07:43 +0000 mca: Handle inconsistent CMCI capability reporting A BIOS bug may apparently cause the BSP to report that it does not implement CMCI, with some APs reporting that they do. In this scenario, avoid a NULL pointer dereference that occurs in cmci_monitor() because cmc_state was not allocated by the BSP. Approved by: re (gjb) PR: 253272 Reported by: asomers, mmacy Reviewed by: kib (previous version) (cherry picked from commit b5770470276268acef21368b3e77a325df883500) (cherry picked from commit 8eebd9592e3daf80c2c743666614119d6c862186) sys/x86/x86/mca.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)