Bug 268857 - pmcstat crashes on particular event/CPU combination
Summary: pmcstat crashes on particular event/CPU combination
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 13.1-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2023-01-10 14:59 UTC by John F. Carr
Modified: 2023-08-01 21:07 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description John F. Carr 2023-01-10 14:59:49 UTC
The following command crashes on Zen CPUs but not older AMD CPUs:

$ pmcstat -P k8-ic-refill-from-l2 echo -n
initlog   0x9030000 "AMD_K8"
Segmentation fault (core dumped)

Perhaps "k8-ic-refill-from-l2" is not a valid event for Zen.  That is not easily discoverable and should not crash the program.

lldb says

* thread #1, name = 'pmcstat', stop reason = breakpoint 1.1
    frame #0: 0x000000720ad83c02 libpmc.so.5`pmc_pmu_event_get_by_idx(cpuid=<unavailable>, idx=8350) at libpmc_pmu_util.c:293:2
   290 	
   291 		if ((pme = pmu_events_map_get(cpuid)) == NULL)
   292 			return (NULL);
-> 293 		assert(pme->table[idx].name);
   294 		return (pme->table[idx].name);
   295 	}
   296 	
(lldb) p pme
(const pmu_events_map *) $2 = 0x000000720af7f9f0
(lldb) p *pme
(const pmu_events_map) $3 = {
  cpuid = 0x000000720abe6054 "AuthenticAMD-23-[[:xdigit:]]+"
  version = 0x000000720ad0c2ad "v1"
  type = 0x000000720ad18386 "core"
  table = 0x000000720af70890
}

Array index idx=8350 is out of bounds and looking up pme->table[idx].name causes a segfault.  I would suggest a bounds check, but I don't see any array size field to compare against.

More specifically, pmcstat crashes on

CPU: AMD EPYC 7402P 24-Core Processor                (2794.84-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x830f10  Family=0x17  Model=0x31  Stepping=0
CPU: AMD Ryzen 5 PRO 2400GE w/ Radeon Vega Graphics  (3194.22-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x810f10  Family=0x17  Model=0x11  Stepping=0

but pmcstat does not crash on

CPU: AMD Opteron(tm) X3421 APU                       (2096.10-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x660f01  Family=0x15  Model=0x60  Stepping=1

I am reporting against 13.1-STABLE.  The bug is also present in CURRENT as of last summer.
Comment 1 commit-hook freebsd_committer freebsd_triage 2023-06-07 14:26:56 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=21f7397a61f7bff61a1221cc6340cd980a922540

commit 21f7397a61f7bff61a1221cc6340cd980a922540
Author:     Jessica Clarke <jrtc27@FreeBSD.org>
AuthorDate: 2023-06-07 14:21:18 +0000
Commit:     Jessica Clarke <jrtc27@FreeBSD.org>
CommitDate: 2023-06-07 14:24:29 +0000

    libpmc: Handle PMCALLOCATE log with PMC code on PMU event system

    On an arm64 system that reports as a Cortex A72 r0p3, running

      pmcstat -P CPU_CYCLES command

    works, but

      pmcstat -P cpu-cycles command

    does not. This is because the former uses the PMU event from the JSON
    source, resulting in pl_event in the log event being a small index
    (here, 5) into the generated events table, whilst the latter does not
    match any of the JSON events and falls back on PMC's own tables, mapping
    it to the PMC event 0x14111, i.e. PMC_EV_ARMV8_EVENT_11H. Then, when
    libpmc gets the PMCALLOCATE event, it tries to use the event as an index
    into the JSON-derived table, but doing so only makes sense for the
    former, whilst for the latter it will go way out of bounds and either
    read junk (which may trigger the != NULL assertion) or segfault. As far
    as I can tell we don't have anything lying around to tell us which of
    the two cases we're in, but we can exploit the fact that the first
    0x1000 PMC event codes are reserved, and that none of our PMU events
    tables reach that number of entries yet.

    PR:             268857
    Reviewed by:    mhorne
    MFC after:      1 month
    Differential Revision:  https://reviews.freebsd.org/D39592

 lib/libpmc/libpmc.c |  9 ++++++++-
 lib/libpmc/pmclog.c | 27 +++++++++++++++++++++------
 2 files changed, 29 insertions(+), 7 deletions(-)
Comment 2 commit-hook freebsd_committer freebsd_triage 2023-08-01 21:07:32 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=051f41ddb517a9d3f6872678ccc3d0b6c0fffca1

commit 051f41ddb517a9d3f6872678ccc3d0b6c0fffca1
Author:     Jessica Clarke <jrtc27@FreeBSD.org>
AuthorDate: 2023-06-07 14:21:18 +0000
Commit:     Jessica Clarke <jrtc27@FreeBSD.org>
CommitDate: 2023-08-01 20:42:53 +0000

    libpmc: Handle PMCALLOCATE log with PMC code on PMU event system

    On an arm64 system that reports as a Cortex A72 r0p3, running

      pmcstat -P CPU_CYCLES command

    works, but

      pmcstat -P cpu-cycles command

    does not. This is because the former uses the PMU event from the JSON
    source, resulting in pl_event in the log event being a small index
    (here, 5) into the generated events table, whilst the latter does not
    match any of the JSON events and falls back on PMC's own tables, mapping
    it to the PMC event 0x14111, i.e. PMC_EV_ARMV8_EVENT_11H. Then, when
    libpmc gets the PMCALLOCATE event, it tries to use the event as an index
    into the JSON-derived table, but doing so only makes sense for the
    former, whilst for the latter it will go way out of bounds and either
    read junk (which may trigger the != NULL assertion) or segfault. As far
    as I can tell we don't have anything lying around to tell us which of
    the two cases we're in, but we can exploit the fact that the first
    0x1000 PMC event codes are reserved, and that none of our PMU events
    tables reach that number of entries yet.

    PR:             268857
    Reviewed by:    mhorne
    MFC after:      1 month
    Differential Revision:  https://reviews.freebsd.org/D39592

    (cherry picked from commit 21f7397a61f7bff61a1221cc6340cd980a922540)

 lib/libpmc/libpmc.c |  9 ++++++++-
 lib/libpmc/pmclog.c | 27 +++++++++++++++++++++------
 2 files changed, 29 insertions(+), 7 deletions(-)