Bug 200725 - Panic during port build under 10.1-Release Unrecoverable machine check exception
Summary: Panic during port build under 10.1-Release Unrecoverable machine check exception
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.1-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-09 04:15 UTC by fossette
Modified: 2015-06-10 04:40 UTC (History)
0 users

See Also:


Attachments
Results of portmaster -L (28.32 KB, text/plain)
2015-06-09 04:15 UTC, fossette
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description fossette 2015-06-09 04:15:23 UTC
Created attachment 157558 [details]
Results of  portmaster -L

Ever since I installed FreeBSD 10.1-RELEASE exclusively from port builds, I've always experienced kernel panic shutdowns from time to time, especially the big builds like GNU toolchain and Xorg.  I would resume the build, and the installation would eventually go through.  On that note, panics never occurred while just using the system, i.e. not building a port.

I searched the Internet for a solution to fix this problem, but to no avail.  It was suggested in a forum that the CPU might be overheating.  My computer is brand new, and I changed the BIOS setting to have the fans run full time.

And then, I found how to read the core files...  Here's one:

# kgdb kernel /var/crash/vmcore.9
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
MCA: Bank 1, Status 0xbf80000000000124
MCA: Global Cap 0x0000000000000c09, Status 0x0000000000000005
MCA: Vendor "GenuineIntel", ID 0x306c3, APIC ID 7
MCA: CPU 7 UNCOR PCC DCACHE L0 WR error
MCA: Address 0x1119e9c80
MCA: Misc 0x86
MCA: Bank 1, Status 0xbf80000000000124
MCA: Global Cap 0x0000000000000c09, Status 0x0000000000000005
MCA: Vendor "GenuineIntel", ID 0x306c3, APIC ID 6
MCA: CPU 6 UNCOR PCC DCACHE L0 WR error
MCA: Address 0x1119e9c80
MCA: Misc 0x86
panic: Unrecoverable machine check exception
cpuid = 7
KDB: stack backtrace:
#0 0xffffffff80963000 at kdb_backtrace+0x60
#1 0xffffffff80928125 at panic+0x155
#2 0xffffffff80e3b89b at mca_intr+0x6b
#3 0xffffffff80d244b9 at trap+0x99
#4 0xffffffff80d0a782 at calltrap+0x8
Uptime: 6m42s

But now, I can't even upgrade VirtualBox, Firefox, and install Eclipse to try it.

   portmaster virtualbox
and
   portmaster firefox
now systematically generates a kernel panic, so I'm stuck.

HELP!!!

Here's what's in my system at this time (in the attachment).

Don't hesitate to ask for more details to help isolate the problem.  However, don't assume that I'm a UNIX expert.  Thanks for helping out!

Dominique.
Comment 1 Andriy Gapon freebsd_committer 2015-06-09 06:31:55 UTC
This strongly looks like a hardware problem.

mcelog decodes the above report as follows:
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 7 BANK 1 
MISC 86 ADDR 1119e9c80 
MCG status:RIPV MCIP 
MCi status:
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
SRAR

MCA: Data CACHE Level-0 Write Error

STATUS bf80000000000124 MCGSTATUS 5
MCGCAP c09 APICID 7 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 60
Comment 2 fossette 2015-06-10 04:40:09 UTC
Thanks Andriy!

This HARDWARE ERROR didn't help me much at first, but after further Internet searches and the knowledge of high speed processor timings from a previous job I had, I happen to find a nice workaround for this problem.  Indeed, it's hardware related, but IMO, the software should be more helpful to flag the solution.

When I purchased my computer just a few months back, I went through all the nice BIOS options offered to me.  Without going to the extremes, I accepted the (modest) boost features that were offered to me.  My FreeBSD is stored on an M.2 SSD drive, so I suspected that there could possibly exist a timing issue on the bus when signals are exchanged between the CPU, the DRAM and the SRAM.

So, I disabled any BIOS settings related to cache prefetch and CPU boost.  Everything now seems to run smoothly because I was able to build all the ports that I had problems with earlier.

Thanks again!  I'm a happy FreeBSD camper again!

Dominique.