Bug 204686 - panic: critical_exit: td_critnest == 0 running pmcstat (possible older HW issue)
Summary: panic: critical_exit: td_critnest == 0 running pmcstat (possible older HW issue)
Status: Closed Unable to Reproduce
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: arm64 Any
: --- Affects Some People
Assignee: Andrew Turner
URL:
Keywords:
Depends on:
Blocks: 203349
  Show dependency treegraph
 
Reported: 2015-11-19 17:42 UTC by Andrew Turner
Modified: 2017-06-16 13:44 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Turner freebsd_committer freebsd_triage 2015-11-19 17:42:12 UTC
Related to pmc, was running "pmcstat -S INST_RETIRED -O sample.out" while ld was running.

root@cavium:~ # hwpmc: SOFT/16/64/0x67<INT,USR,SYS,REA,WRI> ARMV8/6/32/0x1ff<INT,USR,SYS,EDG,THR,REA,WRI,INV,QUA>
panic: critical_exit: td_critnest == 0
cpuid = 15
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28
         pc = 0xffffff80004d6434  lr = 0xffffff800006d888
         sp = 0xffffff87cc0c4590  fp = 0xffffff87cc0c46b0

db_trace_self_wrapper() at vpanic+0x170
         pc = 0xffffff800006d888  lr = 0xffffff800025ad54
         sp = 0xffffff87cc0c46c0  fp = 0xffffff87cc0c4740

vpanic() at kassert_panic+0x160
         pc = 0xffffff800025ad54  lr = 0xffffff800025abe0
         sp = 0xffffff87cc0c4750  fp = 0xffffff87cc0c4810

kassert_panic() at critical_exit+0xc8
         pc = 0xffffff800025abe0  lr = 0xffffff8000261510
         sp = 0xffffff87cc0c4820  fp = 0xffffff87cc0c4830

critical_exit() at spinlock_exit+0x10
         pc = 0xffffff8000261510  lr = 0xffffff80004ddf10
         sp = 0xffffff87cc0c4840  fp = 0xffffff87cc0c4850

spinlock_exit() at pmc_select_cpu+0x80
         pc = 0xffffff80004ddf10  lr = 0xffffff8047a5ba0c
         sp = 0xffffff87cc0c4860  fp = 0xffffff87cc0c4870

pmc_select_cpu() at pmc_syscall_handler+0x15f8
         pc = 0xffffff8047a5ba0c  lr = 0xffffff8047a5f824
         sp = 0xffffff87cc0c4880  fp = 0xffffff87cc0c49e0

pmc_syscall_handler() at do_el0_sync+0x478
         pc = 0xffffff8047a5f824  lr = 0xffffff80004e7e5c
         sp = 0xffffff87cc0c49f0  fp = 0xffffff87cc0c4aa0

do_el0_sync() at handle_el0_sync+0x58
         pc = 0xffffff80004e7e5c  lr = 0xffffff80004d79d4
         sp = 0xffffff87cc0c4ab0  fp = 0x0000007ffffff500

KDB: enter: panic
[ thread pid 8261 tid 100471 ]
Stopped at      kdb_enter+0x40:
db>
Comment 1 zbb 2015-12-09 12:04:23 UTC
I encountered similar problems with FreeBSD on ARM64 while using hwpmc.
Some of the errors that I found are listed below:

* panic: Unknown kernel exception 0 esr_el1 2000000
* panic: data abort in critical section or under mutex
* panic: VFP exception in the kernel
* panic: Unknown kernel exception 21 esr_el1 86000006

This can be easily reproduced by invoking for example:
$ pmcstat -S CPU_CYCLES -O cpu_cycles.pmc

wait ~30 seconds or more and hit ctrl + C

Platform: ThunderX CRB (single socket)
SVN rev: 291651

Example:

root@thunderx_crb4:~ #   x0:           5b4fc4
  x1:                0
  x2:                0
  x3: ffffff800080d048
  x4: ffffff87cc051cd0
  x5: ffffff87cc051510
  x6:         40761000
  x7:                4
  x8: ffffff800082be00
  x9:                1
 x10:                4
 x11:                0
 x12:             2af8
 x13:         7ffe7ec0
 x14:                b
 x15:             296c
 x16:         7ffe7d60
 x17:                b
 x18: ffffff87cc051640
 x19:                8
 x20:                7
 x21:         40461000
 x22: ffffffc08b5d2438
 x23: ffffffc04c8009a0
 x24:                0
 x25:               68
 x26:                0
 x27:              168
 x28:                0
 x29:         4045d000
 x30:         4045d000
  sp: ffffff87cc051640
  lr: ffffffc01a4dd200
 elr: ffffffc01a4dd200
spsr:         20000085
panic: Unknown kernel exception 0 esr_el1 2000000

cpuid = 0
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28
	 pc = 0xffffff80005a6d04  lr = 0xffffff8000070b84
	 sp = 0xffffff87cc051220  fp = 0xffffff87cc051340

db_trace_self_wrapper() at vpanic+0x170
	 pc = 0xffffff8000070b84  lr = 0xffffff80002bc468
	 sp = 0xffffff87cc051350  fp = 0xffffff87cc0513d0

vpanic() at panic+0x4c
	 pc = 0xffffff80002bc468  lr = 0xffffff80002bc2f4
	 sp = 0xffffff87cc0513e0  fp = 0xffffff87cc051460

panic() at do_el1h_sync+0x128
	 pc = 0xffffff80002bc2f4  lr = 0xffffff80005b87d4
	 sp = 0xffffff87cc051470  fp = 0xffffff87cc051490

do_el1h_sync() at handle_el1h_sync+0x68
	 pc = 0xffffff80005b87d4  lr = 0xffffff80005a8068
	 sp = 0xffffff87cc0514a0  fp = 0xffffff87cc0515b0

handle_el1h_sync() at 0xffffffc01a4dd1fc
	 pc = 0xffffff80005a8068  lr = 0xffffffc01a4dd1fc
	 sp = 0xffffff87cc0515c0  fp = 0x000000004045d000

KDB: enter: panic
[ thread pid 1022 tid 100194 ]
Stopped at      kdb_enter+0x40:
db>


root@thunderx_crb4:~ #   x0:                0
  x1:                0
  x2:                0
  x3: ffffff800080d048
  x4: ffffff87cc146cd0
  x5: ffffff87cc146510
  x6:         40761000
  x7:              100
  x8: b9041a6951000529
  x9:                1
 x10:                4
 x11:                0
 x12:             2af8
 x13:         8004190c
 x14:                b
 x15:             2923
 x16:         80041afd
 x17:                b
 x18: ffffff87cc146640
 x19: ffffffc00e71cd00
 x20: ffffff8000722e50
 x21: ffffff80005aeb08
 x22: ffffff87cc146610
 x23:                0
 x24:                0
 x25:               68
 x26:                0
 x27:              168
 x28:                0
 x29: ffffff87cc1466d0
 x30: ffffff87cc1466d0
  sp: ffffff87cc146640
  lr: ffffff80002c6e1c
 elr: ffffff80002c6e38
spsr:         60000085
 far: b9041a6951000851
 esr:         96000004
timeout stopping cpus
panic: data abort in critical section or under mutex
cpuid = 0
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28
	 pc = 0xffffff80005a6d04  lr = 0xffffff8000070b84
	 sp = 0xffffff87cc146190  fp = 0xffffff87cc1462b0

db_trace_self_wrapper() at vpanic+0x170
	 pc = 0xffffff8000070b84  lr = 0xffffff80002bc468
	 sp = 0xffffff87cc1462c0  fp = 0xffffff87cc146340

vpanic() at panic+0x4c
	 pc = 0xffffff80002bc468  lr = 0xffffff80002bc2f4
	 sp = 0xffffff87cc146350  fp = 0xffffff87cc1463d0

panic() at data_abort+0x1f0
	 pc = 0xffffff80002bc2f4  lr = 0xffffff80005b8a74
	 sp = 0xffffff87cc1463e0  fp = 0xffffff87cc146490

data_abort() at handle_el1h_sync+0x68
	 pc = 0xffffff80005b8a74  lr = 0xffffff80005a8068
	 sp = 0xffffff87cc1464a0  fp = 0xffffff87cc1465b0

handle_el1h_sync() at _sleep+0x2f8
	 pc = 0xffffff80005a8068  lr = 0xffffff80002c6e18
	 sp = 0xffffff87cc1465c0  fp = 0xffffff87cc1466d0

_sleep() at kqueue_kevent+0xd18
	 pc = 0xffffff80002c6e18  lr = 0xffffff8000268ae8
	 sp = 0xffffff87cc1466e0  fp = 0xffffffc00c3a8200

KDB: enter: panic
[ thread pid 1027 tid 100243 ]
Stopped at      kdb_enter+0x40:
db>


root@thunderx_crb4:~ # pmcstat -S CPU_CYCLES -O cpu_cycles.pmc 
^C  x0:           5b4fc4
  x1:                0
  x2:                0
  x3: ffffff800080d048
  x4: ffffff87cc079cd0
  x5: ffffff87cc079510
  x6:         40761000
  x7:              100
  x8: ffffff800082be00
  x9:                1
 x10:                4
 x11:                0
 x12:             2af8
 x13:         7ffae9a1
 x14:                b
 x15:             2714
 x16:         7ffae969
 x17:                b
 x18: ffffff87cc079640
 x19:                8
 x20:                7
 x21:         40461000
 x22: ffffffc087bb2eb8
 x23: ffffffc018c714d0
 x24:                0
 x25:               68
 x26:                0
 x27:              168
 x28:                0
 x29:         4045d000
 x30:         4045d000
  sp: ffffff87cc079640
  lr: ffffffc00e272480
 elr: ffffffc00e272480
spsr:         20000085
 esr:         1fe00000
panic: VFP exception in the kernel
cpuid = 0
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28
	 pc = 0xffffff80005a6d04  lr = 0xffffff8000070b84
	 sp = 0xffffff87cc079220  fp = 0xffffff87cc079340

db_trace_self_wrapper() at vpanic+0x170
	 pc = 0xffffff8000070b84  lr = 0xffffff80002bc468
	 sp = 0xffffff87cc079350  fp = 0xffffff87cc0793d0

vpanic() at panic+0x4c
	 pc = 0xffffff80002bc468  lr = 0xffffff80002bc2f4
	 sp = 0xffffff87cc0793e0  fp = 0xffffff87cc079460

panic() at do_el1h_sync+0x10c
	 pc = 0xffffff80002bc2f4  lr = 0xffffff80005b87b8
	 sp = 0xffffff87cc079470  fp = 0xffffff87cc079490

do_el1h_sync() at handle_el1h_sync+0x68
	 pc = 0xffffff80005b87b8  lr = 0xffffff80005a8068
	 sp = 0xffffff87cc0794a0  fp = 0xffffff87cc0795b0

handle_el1h_sync() at 0xffffffc00e27247c
	 pc = 0xffffff80005a8068  lr = 0xffffffc00e27247c
	 sp = 0xffffff87cc0795c0  fp = 0x000000004045d000

KDB: enter: panic
[ thread pid 810 tid 100202 ]
Stopped at      kdb_enter+0x40:
db>
Comment 2 Andrew Turner freebsd_committer freebsd_triage 2016-06-22 14:12:16 UTC
Have you seen any of these recently on Pass 2.0+ hardware? I can trigger these on Pass 1.1, but I don't seem to be able to on other hardware I tried leading me to think it may be a hardware issue.