Bug 239678 - panic: Unrecoverable machine check exception (MCA: CPU 2 UNCOR DCACHE L1 DWR error)
Summary: panic: Unrecoverable machine check exception (MCA: CPU 2 UNCOR DCACHE L1 DWR ...
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.2-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Bugmeister
URL:
Keywords: crash, needs-qa
Depends on:
Blocks:
 
Reported: 2019-08-06 18:38 UTC by umproko5
Modified: 2025-01-25 05:04 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description umproko5 2019-08-06 18:38:30 UTC
As per freenas support,

In the provided debug I see several identical core dumps, reporting crash on Machine Check Exception:

MCA: Bank 0, Status 0xb4002000c0000145                                                                                                  
MCA: Global Cap 0x0000000000000106, Status 0x0000000000000007                                                                           
MCA: Vendor "AuthenticAMD", ID 0x100f43, APIC ID 2                                                                                      
MCA: CPU 2 UNCOR DCACHE L1 DWR error                                                                                                    
MCA: Address 0x3840b0600                                                                                                                
panic: Unrecoverable machine check exception                                                                                            
cpuid = 2                                                                                                                               
KDB: stack backtrace:                                                                                                                   
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe03d6da8e40                                                          
vpanic() at vpanic+0x177/frame 0xfffffe03d6da8ea0                                                                                       
panic() at panic+0x43/frame 0xfffffe03d6da8f00                                                                                          
mca_intr() at mca_intr+0x9b/frame 0xfffffe03d6da8f20                                                                                    
mchk_calltrap() at mchk_calltrap+0x8/frame 0xfffffe03d6da8f20                                                                           
--- trap 0x1c, rip = 0xffffffff82a3c1db, rsp = 0xfffffe045bb246d0, rbp = 0xfffffe045bb247a0 ---                                         
svm_vmrun() at svm_vmrun+0x99b/frame 0xfffffe045bb247a0                                                                                 
vm_run() at vm_run+0x1fc/frame 0xfffffe045bb24880                                                                                       
vmmdev_ioctl() at vmmdev_ioctl+0x85f/frame 0xfffffe045bb24920                                                                           
devfs_ioctl_f() at devfs_ioctl_f+0x128/frame 0xfffffe045bb24980                                                                         
kern_ioctl() at kern_ioctl+0x26d/frame 0xfffffe045bb249f0                                                                               
sys_ioctl() at sys_ioctl+0x15c/frame 0xfffffe045bb24ac0                                                                                 
amd64_syscall() at amd64_syscall+0xa38/frame 0xfffffe045bb24bf0                                                                         
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe045bb24bf0                                                             
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8017f281a, rsp = 0x7fffde7f1e28, rbp = 0x7fffde7f1ee0 ---                           
KDB: enter: panic          

Usually MCA panics are result of hardware issues. Considering that you are running desktop hardware, I am not exactly surprised.  But what makes me worry is that in all cases exception happened while system was executing virtual machine, so it may be either a trigger (and I don't know much about AMD MCA) or just unrelated witness, just because this system spends most of its CPU time running VMs. 

You may try to report this issue to FreeBSD in case somebody really knows how to decode AMD MCA.  But what I see now looks like checksum error in CPU L1 cache, which is a hardware problem.
Comment 1 Conrad Meyer freebsd_committer freebsd_triage 2019-08-07 16:27:54 UTC
I don't do unpaid support work for iX.  If you can reproduce in FreeBSD, I'm interested.
Comment 2 Alexander Motin freebsd_committer freebsd_triage 2019-08-07 17:10:29 UTC
Conrad, your comment sound unfair and impolite to me.  We at iX do all we can for FreeBSD.  We upstream everything possible.  But we can not maintain whole OS ourselves.  You can consider the system in question as FreeBSD 11.2 with some backports, but not in MCA code.  If you think the problem may already be fixed in newer FreeBSD version, that is fine, otherwise some bit of expertise shared would be nice.  And this is not a paid customer, we are in the same situation as FreeBSD project.
Comment 3 Mark Linimon freebsd_committer freebsd_triage 2025-01-25 05:04:31 UTC
^Triage: I'm sorry that this PR did not get addressed in a timely fashion.

By now, the version that it was created against is long out of support.
Please re-open if it is still a problem on a supported version.