Bug 177876 - [mips] kernel stack overflow panic on mips64, EdgeRouter Lite
Summary: [mips] kernel stack overflow panic on mips64, EdgeRouter Lite
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.0-CURRENT
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-mips (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-04-15 22:50 UTC by Joe Holden
Modified: 2018-05-29 14:12 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Joe Holden 2013-04-15 22:50:00 UTC
Under heavy CPU load, after a an hour or so FreeBSD will panic and reboot with the following:

panic: kernel stack overflow - trapframe at 0xffffffff80835eb0
cpuid = 0

The panic is identical all the time and only happens under heavy CPU load which of course may well happen in the field.

Ideally I'd like someone to advise me on how to debug the problem, failing that it should be easy enough to recreate.

It may well be a hardware issue as these boxes do run very hot, bordering on thermal design limits.

How-To-Repeat: Start compiling a port.
Comment 1 Joe Holden 2013-04-15 23:42:19 UTC
Having rebuilt the kernel with more debug and bringing my tree upto 
HEAD, the kernel is consistently dropping to db on boot:

FreeBSD 10.0-CURRENT #5 r249529: Mon Apr 15 23:28:04 BST 2013
 
root@abby.lhr1.as41113.net:/usr/obj/mips.mips64/usr/src/sys/OCTEON-ERL mips
gcc version 4.2.1 20070831 patched [FreeBSD]
cpu:0-Trap cause = 2 (TLB miss (load or instr. fetch) - kernel mode)
[ thread pid 0 tid 0 ]
Stopped at      0xffffffff80268bdc:     lb      v0,0(s2)
db> bt
Tracing pid 0 tid 0 td 0xffffffff808a25f0
ffffffff8067fd98+40 (?,?,?,?) ra ffffffff8013ba24 sp 98000000009305b0 sz 16
ffffffff8013b898+18c (0,?,ffffffff,?) ra ffffffff8013b068 sp 
98000000009305c0 sz 48
ffffffff8013abe0+488 (?,?,?,?) ra ffffffff8013b334 sp 98000000009305f0 
sz 192
ffffffff8013b240+f4 (?,?,?,?) ra ffffffff8013ed88 sp 98000000009306b0 sz 16
ffffffff8013ebe0+1a8 (?,?,?,?) ra ffffffff8031e180 sp 98000000009306c0 
sz 816
ffffffff8031dfe0+1a0 (?,?,?,?) ra ffffffff80696238 sp 98000000009309f0 sz 64
trap+18a0 (?,?,?,?) ra ffffffff8068104c sp 9800000000930a30 sz 288
MipsKernGenException+15c (0,bef000,ffffffff,3) ra ffffffff80268bdc sp 
9800000000930b50 sz 368
ffffffff80268b70+6c (?,?,?,?) ra ffffffff80248754 sp 9800000000930cc0 sz 48
ffffffff80248548+20c (?,?,?,?) ra ffffffff80100134 sp 9800000000930cf0 sz 32
ffffffff80100080+b4 (?,?,?,?) ra 0 sp 9800000000930d10 sz 0
pid 0
db>

Not sure if that means anything to anyone!

Cheers,
Joe
Comment 2 Mark Linimon freebsd_committer freebsd_triage 2013-04-21 20:29:06 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-mips

Over to maintainer(s).
Comment 3 Joe Holden 2013-04-22 03:58:00 UTC
So the TLB miss problem was fixed by Warner, but since about then the 
following happens when booting (either from NFS or USB), completely 
fresh world and src tree, no special make options or optimisations...

Kernel config: http://sprunge.us/EVjO

Trying to mount root from nfs: []...
NFS ROOT: 172.16.8.3:/nfs/bsd/fbsd/erl
warning: no time-of-day clock registered, system time will not be set 
accurately
warning: no time-of-day clock registered, system time will not be set 
accurately
start_init: trying /sbin/init
Cannot map anonymous memory
Out of memory
Enter full pathname of shell or RETURN for /bin/sh:
Cannot map anonymous memory
Out of memory
Cannot map anonymous memory
Out of memory
Enter full pathname of shell or RETURN for /bin/sh:

Usual procedure to cross-build from amd64:

make buildworld buildkernel KERNCONF=OCTEON-ERL TARGET=mips64 
TARGET_ARCH=mips TARGET_CPUTYPE=octeon WITHOUT_MODULES="cxgbe mwlfw mwl 
ralfw ral runfw run"

src.conf just contains NO_FSCHG=
Comment 4 sson 2014-05-14 15:41:13 UTC
The following set of patches increases the kernel thread stack size to 16K by using a 16K page size for just the kernel stack.   Unlike my previous patch set it doesn't require additional wired TLB entries.  I have been using this patch set for a few months on my ERL with a NFS mount to 'buildworld' and for port building and have not seen the 'kernel stack overflow' panic.  It does add a bit of MIPS64 dependent code in the VM layer.  Maybe this should be moved to the pmap layer at some point.

The patch set:

http://people.freebsd.org/~sson/mips/kstack/kstack_large_page_1.diff
http://people.freebsd.org/~sson/mips/kstack/kstack_large_page_2.diff
http://people.freebsd.org/~sson/mips/kstack/kstack_large_page_3.diff

or one large patch:

http://people.freebsd.org/~sson/mips/kstack/kstack_large_page.diff

"option KSTACK_LARGE_PAGE" needs to be added to the kernel conf file to enable.

-stacey. (sson@)
Comment 5 Konstantin Belousov freebsd_committer 2014-07-06 08:44:39 UTC
(In reply to sson from comment #4)

Yes, the more appropriate place for the vm_kstack_?alloc seems to
be the arch/arch/vm_machdep.c.  The non-mips architectures probably
should either reference, or include the common implementation.

Ideally, patch series would be split into MI part which restructures
vm_glue.c and adds MI implementation, used by all architectures.  The later changes for MIPS would do whatever is needed in MD.
Comment 6 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:49:16 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.