Bug 247432 - panic: general protection fault in ucp_start_pmc for uncore on E5504 processor
Summary: panic: general protection fault in ucp_start_pmc for uncore on E5504 processor
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.1-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-06-19 20:15 UTC by dgmorris@earthlink.net
Modified: 2020-06-19 21:42 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description dgmorris@earthlink.net 2020-06-19 20:15:39 UTC
Internal Dell FreeBSD-based product testing includes a pmc test that among other things does:

        for i in $(pmccontrol -L | grep -v -e "IAF" -e "IAP" -e "TSC" -e "UNC" \
                 -e "UCF" -e "UCP" -e "SOFT"); do

                pmcstat -p $i ls
                process_cnt=`echo $?`

                # Error 71 is returned if counter is system specific and
                # not process specific so skip then
                if [ $process_cnt -ne 0 ] && [ $process_cnt -ne 71 ]; then
                        atf_fail "PMC counter not working"
                fi
        done

This produces a panic on E5504 processor systems.

Reproducing locally to narrow it down, it became apparent that the uncore options are triggering the panic:
        mem_uncore_retired.local_dram
        mem_uncore_retired.other_core_l2_hitm
        mem_uncore_retired.remote_cache_local_home_hit
        mem_uncore_retired.remote_dram
        mem_uncore_retired.uncacheable

Panic information:

Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer     = 0x20:0xffffffff82c30604
stack pointer           = 0x28:0xfffffe0044204640
frame pointer           = 0x28:0xfffffe0044204640
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 1115 (pmcstat)
trap number             = 9
panic: general protection fault
cpuid = 0
time = 1592596633
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0044204350
vpanic() at vpanic+0x19d/frame 0xfffffe00442043a0
panic() at panic+0x43/frame 0xfffffe0044204400
trap_fatal() at trap_fatal+0x39c/frame 0xfffffe0044204460
trap() at trap+0x6c/frame 0xfffffe0044204570
calltrap() at calltrap+0x8/frame 0xfffffe0044204570
--- trap 0x9, rip = 0xffffffff82c30604, rsp = 0xfffffe0044204640, rbp = 0xfffffe0044204640 ---
ucp_start_pmc() at ucp_start_pmc+0xa4/frame 0xfffffe0044204640
pmc_hook_handler() at pmc_hook_handler+0xfda/frame 0xfffffe0044204700
sched_switch() at sched_switch+0x691/frame 0xfffffe00442047d0
mi_switch() at mi_switch+0xe2/frame 0xfffffe0044204800
sleepq_catch_signals() at sleepq_catch_signals+0x425/frame 0xfffffe0044204850
sleepq_wait_sig() at sleepq_wait_sig+0xf/frame 0xfffffe0044204880
_sleep() at _sleep+0x23a/frame 0xfffffe00442048f0
sbwait() at sbwait+0x4c/frame 0xfffffe0044204910
soreceive_generic() at soreceive_generic+0x286/frame 0xfffffe00442049e0
soreceive() at soreceive+0x44/frame 0xfffffe0044204a00
dofileread() at dofileread+0x95/frame 0xfffffe0044204a40
sys_read() at sys_read+0xc1/frame 0xfffffe0044204ab0
amd64_syscall() at amd64_syscall+0x364/frame 0xfffffe0044204bf0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0044204bf0
--- syscall (3, FreeBSD ELF64, sys_read), rip = 0x80095adfa, rsp = 0x7fffffffe3f8, rbp = 0x7fffffffe470 ---
Uptime: 1m4s
Dumping 435 out of 6085 MB:..4%..12%..23%..34%..41%..52%..63%..74%..81%..92%

__curthread () at /usr/src/sys/amd64/include/pcpu.h:234
234             __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (OFFSETOF_CURTHREAD));
(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu.h:234
#1  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:371
#2  0xffffffff80bdf95d in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451
#3  0xffffffff80bdfde9 in vpanic (fmt=<optimized out>, ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:877
#4  0xffffffff80bdfbe3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:804
#5  0xffffffff810c93cc in trap_fatal (frame=0xfffffe0044204580, eva=0) at /usr/src/sys/amd64/amd64/trap.c:943
#6  0xffffffff810c87dc in trap (frame=0xfffffe0044204580) at /usr/src/sys/amd64/amd64/trap.c:221
#7  <signal handler called>
#8  0xffffffff82c30604 in wrmsr (msr=960, newval=<optimized out>) at /usr/src/sys/amd64/include/cpufunc.h:433
#9  ucp_start_pmc (cpu=<optimized out>, ri=0) at /usr/src/sys/dev/hwpmc/hwpmc_uncore.c:707
#10 0xffffffff82c2556a in pmc_process_csw_in (td=<optimized out>) at /usr/src/sys/dev/hwpmc/hwpmc_mod.c:1492
#11 pmc_hook_handler (td=0xfffff80009bf75e0, function=<optimized out>, arg=<optimized out>) at /usr/src/sys/dev/hwpmc/hwpmc_mod.c:2210
#12 0xffffffff80c119f1 in sched_switch (td=0xfffff80009bf75e0, newtd=<optimized out>, flags=<optimized out>) at /usr/src/sys/kern/sched_ule.c:2120
#13 0xffffffff80beb922 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:452
#14 0xffffffff80c3c265 in sleepq_catch_signals (wchan=0xfffff800097c053c, pri=-1) at /usr/src/sys/kern/subr_sleepqueue.c:528
#15 0xffffffff80c3bd9f in sleepq_wait_sig (wchan=0xfffff8000fbaf500, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:719
#16 0xffffffff80beb34a in _sleep (ident=0xfffff800097c053c, lock=0xfffff800097c04c0, priority=360, wmesg=0xffffffff81258462 "sbwait", sbt=0, pr=0, flags=0)
    at /usr/src/sys/kern/kern_synch.c:215
#17 0xffffffff80c77cec in sbwait (sb=0x100000000) at /usr/src/sys/kern/uipc_sockbuf.c:267
#18 0xffffffff80c7d176 in soreceive_generic (so=<optimized out>, psa=0x0, uio=0xfffffe0044204a50, mp0=0x0, controlp=0x0, flagsp=0x0)
    at /usr/src/sys/kern/uipc_socket.c:1813
#19 0xffffffff80c7ef94 in soreceive (so=0xfffff8000fbaf500, psa=0x100000000, uio=0x0, mp0=0x3c0, controlp=0x43200f, flagsp=0x0)
    at /usr/src/sys/kern/uipc_socket.c:2563
#20 0xffffffff80c4c505 in fo_read (fp=<optimized out>, uio=<optimized out>, active_cred=0x0, flags=<optimized out>, td=<optimized out>)
    at /usr/src/sys/sys/file.h:313
#21 dofileread (td=<optimized out>, fd=5, fp=<optimized out>, auio=0xfffffe0044204a50, offset=5, flags=<optimized out>) at /usr/src/sys/kern/sys_generic.c:368
#22 0xffffffff80c4c081 in kern_readv (td=<optimized out>, fd=5, auio=<optimized out>) at /usr/src/sys/kern/sys_generic.c:289
#23 sys_read (td=0xfffff80009bf75e0, uap=<optimized out>) at /usr/src/sys/kern/sys_generic.c:205
#24 0xffffffff810c9f84 in syscallenter (td=0xfffff80009bf75e0) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135
#25 amd64_syscall (td=0xfffff80009bf75e0, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1186
#26 <signal handler called>
#27 0x000000080095adfa in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffffffe3f8
(kgdb) frame 9
#9  ucp_start_pmc (cpu=<optimized out>, ri=0) at /usr/src/sys/dev/hwpmc/hwpmc_uncore.c:707
707             wrmsr(SELECTSEL(uncore_cputype) + ri, evsel);
(kgdb) p ri
$1 = 0
(kgdb) p uncore_cputype
$2 = PMC_CPU_INTEL_COREI7
(kgdb) p evsel
$3 = 4399119
(kgdb) p/x evsel
$4 = 0x43200f

Note that the 960 passed to wrmsr does properly correspond to 0x3c0 (UCP_EVSEL0) as SELECTSEL(PMC_CPU_INTEL_COREI7) should be returning.

This reproduces 100% for me on a Z600 Workstation with:
CPU: Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz (1995.04-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x106a5  Family=0x6  Model=0x1a  Stepping=5
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x9ce3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT>
  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,VPID
  TSC: P-state invariant, performance statistics

I suspect it does for any other E5504 system as well. This is a dual socket motherboard with a single socket populated, but based on the Intel Software Manuals, the uncore stuff should be within the package - so I don't think that should matter (just reporting it in case it rings a bell).

Older hardware, I know - but figured it was worth reporting.
Comment 1 dgmorris@earthlink.net 2020-06-19 20:23:39 UTC
To be clear (if it wasn't), panic reproduction steps:

kldload hwpmc
pmcstat -p mem_uncore_retired.local_dram ls