Bug 266730 - powerpc kernel crash on loadable modules that use copyin/copyout ifunc
Summary: powerpc kernel crash on loadable modules that use copyin/copyout ifunc
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.1-STABLE
Hardware: powerpc Any
: --- Affects Many People
Assignee: Alfredo Dal'Ava Junior
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2022-09-30 19:05 UTC by Alfredo Dal'Ava Junior
Modified: 2024-01-18 15:31 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alfredo Dal'Ava Junior freebsd_committer freebsd_triage 2022-09-30 19:05:50 UTC
At least powerpc64 and powerpc64le kernels panic when copyin/copyout functions are called by external kernel modules (like pfsync, zfs and linuxulator).

The panic with exception 0x480 (instruction segment exception) occurs
in a context where the functions are set as pointers in cpuset_copy_cb
struct. It doesn't crash when functions are called directly (without the struct) or wrapped to be called through a local function wrapper.


This affects FreeBSD 13.1/STABLE and 14/CURRENT.

How to reproduce:

1- Boot FreeBSD 13.1/STABLE 
2- kldload pfsync

Results:

fatal kernel trap:

   exception       = 0x480 (instruction segment exception)
   virtual address = 0x38bf00ec7fc3f378
   srr0            = 0x38bf00ec7fc3f378 (0x78bf00ec7fc3f378)
   srr1            = 0x8000000000009032
   current msr     = 0x8000000000009032
   lr              = 0xc008000051a143f4 (0x8000051a143f4)
   frame           = 0xc00800001b5afd50
   curthread       = 0xc0080000518330e0
          pid = 832, comm = ifconfig

panic: instruction segment exception trap
cpuid = 1
time = 1664564648
KDB: stack backtrace:
0xc00800001b5af970: at kdb_backtrace+0x60
0xc00800001b5afa80: at vpanic+0x1b8
0xc00800001b5afb30: at panic+0x44
0xc00800001b5afb60: at trap+0x324
0xc00800001b5afc90: at powerpc_interrupt+0x1cc
0xc00800001b5afd20: kernel ISE trap @ 0x38bf00ec7fc3f378 by 0x38bf00ec7fc3f378: srr1=0x8000000000009032
            r1=0xc00800001b5affd0 cr=0x28020a40 xer=0x20040000 ctr=0x38bf00ec7fc3f378 r2=0xc008000051a348e8 frame=0xc00800001b5afd50
0xc00800001b5affd0: at pfsyncioctl+0x368
0xc00800001b5b00f0: at ifioctl+0xc44
0xc00800001b5b0290: at soo_ioctl+0x1b4
0xc00800001b5b0320: at kern_ioctl+0x3d4
0xc00800001b5b03f0: at sys_ioctl+0x134
0xc00800001b5b0520: at syscall+0x194
0xc00800001b5b0620: at trap+0x5e8
0xc00800001b5b0750: at powerpc_interrupt+0x1cc
0xc00800001b5b07e0: user SC trap by 0x8013c5be0: srr1=0x800000000280f932
            r1=0xfffffbfffe0c0 cr=0x22251682 xer=0 ctr=0x8013c5bd0 r2=0x8014a2478 frame=0xc00800001b5b0810
KDB: enter: panic
[ thread pid 832 tid 100073 ]
Stopped at      kdb_enter+0x70: ori     r0, r0, 0x0
db>
Comment 1 Alfredo Dal'Ava Junior freebsd_committer freebsd_triage 2022-09-30 19:26:14 UTC
Links related to the issue:

First (naive) attempt to fix: https://reviews.llvm.org/D133745
LLD reproduce tarball: https://github.com/llvm/llvm-project/issues/57722
Userland test case (try to reproduce similar issue): https://github.com/llvm/llvm-project/issues/57851

Problem isn't seen when kernel is linked with LLD9. LLVM/LLD behavior change was identified as due to commit: https://reviews.llvm.org/rGdc06b0bc9ad055d06535462d91bfc2a744b2f589

The discussion with LLVM community is still ongoing. The following temporary workaround was proposed on FreeBSD side: 

https://reviews.freebsd.org/D36234
Comment 2 commit-hook freebsd_committer freebsd_triage 2022-10-03 12:03:37 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=db79bf75ac9eb1b5678ccbaebb45fb88c0e0e1e3

commit db79bf75ac9eb1b5678ccbaebb45fb88c0e0e1e3
Author:     Alfredo Dal'Ava Junior <alfredo@FreeBSD.org>
AuthorDate: 2022-10-03 14:51:05 +0000
Commit:     Alfredo Dal'Ava Junior <alfredo@FreeBSD.org>
CommitDate: 2022-10-03 15:03:09 +0000

    powerpc: cpuset: add local functions for copyin/copyout

    Add local functions to workaround an instruction segment trap (panic)
    when the indirect functions copyin and copyout are called by an external
    loadable kernel module (i.e. pfsync, zfs and linuxulator). The crash
    was triggered by change 47a57144af25a7bd768b29272d50a36fdf2874ba, but
    kernel binary linked with LLD 9 works fine. LLVM bisect points that LLD
    behavior chaged after dc06b0bc9ad055d06535462d91bfc2a744b2f589.

    This is know to affect powerpc targets only and the final fix is still
    being discussed with the LLVM community.

    PR:     266730
    Reviewed by:    luporl, jhibbits (on IRC, previous version)
    MFC after:      2 days
    Sponsored by:   Instituto de Pesquisas Eldorado (eldorado.org.br)
    Differential Revision:  https://reviews.freebsd.org/D36234

 sys/kern/kern_cpuset.c | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)
Comment 3 commit-hook freebsd_committer freebsd_triage 2022-10-05 21:15:26 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=05f9810b31973fb0d5f07a6eb9a12a22f81c38ad

commit 05f9810b31973fb0d5f07a6eb9a12a22f81c38ad
Author:     Alfredo Dal'Ava Junior <alfredo@FreeBSD.org>
AuthorDate: 2022-10-03 14:51:05 +0000
Commit:     Alfredo Dal'Ava Junior <alfredo@FreeBSD.org>
CommitDate: 2022-10-06 00:14:19 +0000

    powerpc: cpuset: add local functions for copyin/copyout

    Add local functions to workaround an instruction segment trap (panic)
    when the indirect functions copyin and copyout are called by an external
    loadable kernel module (i.e. pfsync, zfs and linuxulator). The crash
    was triggered by change 47a57144af25a7bd768b29272d50a36fdf2874ba, but
    kernel binary linked with LLD 9 works fine. LLVM bisect points that LLD
    behavior chaged after dc06b0bc9ad055d06535462d91bfc2a744b2f589.

    This is know to affect powerpc targets only and the final fix is still
    being discussed with the LLVM community.

    PR:     266730
    Reviewed by:    luporl, jhibbits (on IRC, previous version)
    MFC after:      2 days
    Sponsored by:   Instituto de Pesquisas Eldorado (eldorado.org.br)
    Differential Revision:  https://reviews.freebsd.org/D36234

    (cherry picked from commit db79bf75ac9eb1b5678ccbaebb45fb88c0e0e1e3)

 sys/kern/kern_cpuset.c | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)
Comment 4 John F. Carr 2023-07-25 00:46:35 UTC
I saw what may be the same crash on amd64 running 12.4-CURRENT, first boot after upgrading from 12.3.  I started the microcode_update service and the system promptly crashed.

#4  0xffffffff810d66af in trap_fatal (frame=<value optimized out>, 
    eva=<value optimized out>) at /data/freebsd/12/sys/amd64/amd64/trap.c:921
#5  0xffffffff810d66ff in trap_pfault (frame=0xfffffe002f78f9e0, 
    signo=<value optimized out>, ucode=<value optimized out>) at pcpu_aux.h:55
#6  0xffffffff810aec68 in calltrap ()
    at /data/freebsd/12/sys/amd64/amd64/exception.S:289
#7  0xffffffff810d2b73 in copyout_nosmap_std ()
    at /data/freebsd/12/sys/amd64/amd64/support.S:805
#8  0xffffffff80c29f2d in uiomove_faultflag (cp=0xfffffe002686a000, n=98, 
    uio=0xfffffe002f78fba0, nofault=<value optimized out>)
    at /data/freebsd/12/sys/kern/subr_uio.c:254
#9  0xffffffff80c32333 in pipe_read (fp=0xfffff80012598550, 
    uio=0xfffffe002f78fba0, active_cred=<value optimized out>, 
    flags=<value optimized out>, td=<value optimized out>)
    at /data/freebsd/12/sys/kern/sys_pipe.c:712
#10 0xffffffff80c2f3a5 in dofileread (td=<value optimized out>, fd=0, 
    fp=<value optimized out>, auio=0xfffffe002f78fba0, 
    offset=<value optimized out>, flags=<value optimized out>) at file.h:317
#11 0xffffffff80c2ef20 in sys_read (td=0xfffff8001cade740, uap=Unhandled dwarf expression opcode 0xa3
)
    at /data/freebsd/12/sys/kern/sys_generic.c:289
#12 0xffffffff810d7267 in amd64_syscall (td=0xfffff8001cade740, traced=0)
    at subr_syscall.c:144
#13 0xffffffff810af58e in fast_syscall_common ()
    at /data/freebsd/12/sys/amd64/amd64/exception.S:582

The active process was "logger".
Comment 5 Mark Linimon freebsd_committer freebsd_triage 2024-01-10 04:46:41 UTC
^Triage: assign to committer that resolved.

To John F. Carr: please let us know if you are still seeing the amd64 crash on a newer version of FreeBSD.
Comment 6 John F. Carr 2024-01-14 00:58:27 UTC
Funny you should ask...

Today I updated my amd64 13.2-STABLE system for the first time in a month.  It crashed with a similar error on the first boot.  A microcode update caused a page fault trying to send data to the logger.  core.txt contents below.  This panic hadn't happened before on this system.  Maybe the update to llvm17 affected code generation.



Updating CPU Microcode...


Fatal trap 12: page fault while in kernel mode
cpuid = 45; apic id = 3b
fault virtual address	= 0x388e97560000
fault code		= supervisor write data, page not present
instruction pointer	= 0x20:0xffffffff81088c43
stack pointer	        = 0x28:0xfffffe03a79d7ca0
frame pointer	        = 0x28:0xfffffe03a79d7ca0
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 145 (logger)
trap number		= 12
panic: page fault
cpuid = 45
time = 1705192820
KDB: stack backtrace:
#0 0xffffffff80c199b5 at kdb_backtrace+0x65
#1 0xffffffff80bcebf2 at vpanic+0x152
#2 0xffffffff80bce9f3 at panic+0x43
#3 0xffffffff8108c56c at trap_fatal+0x38c
#4 0xffffffff8108c5d7 at trap_pfault+0x67
#5 0xffffffff81060f08 at calltrap+0x8
#6 0xffffffff80c337d5 at uiomove_faultflag+0x135  [this is a call to copyout() -jfc]
#7 0xffffffff80c3de55 at pipe_read+0x2f5
#8 0xffffffff80c3a586 at dofileread+0x86
#9 0xffffffff80c3a0d2 at sys_read+0xc2
#10 0xffffffff8108ced0 at amd64_syscall+0x140
#11 0xffffffff8106181b at fast_syscall_common+0xf8
Uptime: 38s
Comment 7 Mark Linimon freebsd_committer freebsd_triage 2024-01-18 15:31:11 UTC
^Triage: create new PR 276426 to hold the amd64 content.