Bug 204376

Summary: system heavily loaded while at db> prompt
Product: Base System Reporter: Ed Maste <emaste>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: New ---    
Severity: Affects Only Me CC: cem, ngie
Priority: ---    
Version: CURRENT   
Hardware: arm64   
OS: Any   

Description Ed Maste freebsd_committer freebsd_triage 2015-11-08 18:02:10 UTC
I don't have a way to directly quantify this, but have observed it as a side-effect. I have a Cavium ThunderX 96-core system beside me in a hotel room. While operating normally the system fans are reasonably quiet, but when the system panics the they start increasing in speed as the system heats up. After a short while are producing enough noise that I cannot leave the system running, ending my debug session.
Comment 1 Enji Cooper freebsd_committer freebsd_triage 2015-11-08 23:38:43 UTC
It's not just arm64; amd64 does/did a horrible job at yielding when in the debugger (part of the reason why we have a script which goes and suspends test VMs at $work if/when they panic).

Conrad had a patch out for amd64 a few months ago which yielded in the debugger a bit on amd64, but IIRC there were issues at the time. I'll let him comment on it though.

It would be nice if dropping into the debugger didn't spin all the CPUs at ~100% though.
Comment 2 Conrad Meyer freebsd_committer freebsd_triage 2015-11-09 02:23:15 UTC
If ARM is anything like amd64, it just spinwaits in IPI_STOP (waiting for the CPU
to be re-enabled).  On amd64 you could reduce it to 2 CPUs spinning pretty easily
(hlt any non-panic and non-BSP core -- they'll never be needed until reboot).
But that still leaves 2 CPUs spinning.

The patch attempted to hlt all non-panic CPUs in IPI_STOP, but leave interrupts
enabled so they could be woken again.  This does Not Work Well in panic context
(I forget the details, but if you've paniced you really don't want normal interrupt
code running on the non-ddb CPU(s)).
Comment 3 Ed Maste freebsd_committer freebsd_triage 2015-11-18 16:42:00 UTC
Remove mention of specific system type from headline - this issue affects multiple architectures