Bug 234539

Summary: [PowerPC64] panic: FPU already enabled for thread
Product: Base System Reporter: Sean Bruno <sbruno>
Component: kernAssignee: freebsd-ppc (Nobody) <ppc>
Status: Closed FIXED    
Severity: Affects Some People CC: breno.leitao, leandro.lupori, leonardo.bianconi, luporl
Priority: --- Keywords: crash, needs-qa
Version: CURRENT   
Hardware: powerpc   
OS: Any   

Description Sean Bruno freebsd_committer 2019-01-01 06:45:32 UTC
Not a lot I can get from a ppc64 crash other than the backtrace:

panic: FPU already enabled for thread
cpuid = 23
time = 1546311842
KDB: stack backtrace:
0xe000000201c47330: at .kdb_backtrace+0x5c
0xe000000201c47460: at .vpanic+0x1b4
0xe000000201c47520: at .panic+0x38
0xe000000201c475b0: at .trap+0xb64
0xe000000201c47770: at .powerpc_interrupt+0x290
0xe000000201c47810: user FPU trap by 0x81086e528: srr1=0x900000000000d032
            r1=0x3fffffffdfdfaea0 cr=0x22040024 xer=0 ctr=0x81086e51c r2=0x810891818
KDB: enter: panic
[ thread pid 17895 tid 101554 ]
Stopped at      .kdb_enter+0x60:        ld      r2, r1, 0x28

The system was in the middle of a package build run.
Comment 1 Sean Bruno freebsd_committer 2019-01-09 23:33:08 UTC
Firmware that is running:

        ibm,firmware-versions {
                bmc-firmware-version = "1.10";
                buildroot = "2018.05.1-114-g1822255eab";ensor";
                capp-ucode = "p9-dd2-v4";
                hostboot = "p8-30b88ed-p580ec27";
                hostboot-binaries = "hw091818a.930";
                linux = "4.18.6-openpower1-p40b056c";
                machine-xml = "e0fae90-p90e7e34";
                occ = "p8-28f2cec";
                open-power = "palmetto-v2.1-134-g1ad4886";
                petitboot = "1.9.1";
                phandle = <0x10000087>;
                skiboot = "v6.1-124-g7dbf80d1db45";
Comment 2 Leonardo Bianconi 2019-02-01 15:08:33 UTC
I'm analyzing this issue and it was reproduced in a machine with 32G of RAM, but while trying the same steps in a machine with 256G the issue does not occur.

I'm reducing the OS memory and adding a trap in the code to identify divergences between PCB flags and SRR1 flags when leaving kernel space.

Also building in the 32G machine with the trap change in order to identify the code being executed.
Comment 3 Sean Bruno freebsd_committer 2019-02-11 04:35:21 UTC
(In reply to Leonardo Bianconi from comment #2)
It does seem that the issue takes "longer" to occur with more ram in the machine.  I was able to get a panic after 30 hours of package building.
Comment 4 Leandro Lupori freebsd_committer 2019-02-12 18:54:32 UTC
The change in https://reviews.freebsd.org/D19166 fixed the issue for Leonardo and I.
Comment 5 commit-hook freebsd_committer 2019-02-14 15:15:54 UTC
A commit references this bug:

Author: luporl
Date: Thu Feb 14 15:15:32 UTC 2019
New revision: 344123
URL: https://svnweb.freebsd.org/changeset/base/344123

  [PPC64] Fix mismatch between thread flags and MSR

  When sigreturn() restored a thread's context, SRR1 was being restored
  to its previous value, but pcb_flags was not being touched.

  This could cause a mismatch between the thread's MSR and its pcb_flags.
  For instance, when the thread used the FPU for the first time inside
  the signal handler, sigreturn() would clear SRR1, but not pcb_flags.
  Then, the thread would return with the FPU bit cleared in MSR and,
  the next time it tried to use the FPU, it would fail on a KASSERT
  that checked if the FPU was disabled.

  This change clears the FPU bit in both pcb_flags and frame->srr1,
  as the code that restores the context expects to use the FPU trap
  to re-enable it.

  PR:		234539
  Reported by:	sbruno
  Reviewed by:	jhibbits, sbruno
  Differential Revision:	https://reviews.freebsd.org/D19166

Comment 6 Sean Bruno freebsd_committer 2019-02-16 22:34:16 UTC
Fantastic work.  This definitely is fixed now.