Under heavy CPU load, after a an hour or so FreeBSD will panic and reboot with the following: panic: kernel stack overflow - trapframe at 0xffffffff80835eb0 cpuid = 0 The panic is identical all the time and only happens under heavy CPU load which of course may well happen in the field. Ideally I'd like someone to advise me on how to debug the problem, failing that it should be easy enough to recreate. It may well be a hardware issue as these boxes do run very hot, bordering on thermal design limits. How-To-Repeat: Start compiling a port.
Having rebuilt the kernel with more debug and bringing my tree upto HEAD, the kernel is consistently dropping to db on boot: FreeBSD 10.0-CURRENT #5 r249529: Mon Apr 15 23:28:04 BST 2013 root@abby.lhr1.as41113.net:/usr/obj/mips.mips64/usr/src/sys/OCTEON-ERL mips gcc version 4.2.1 20070831 patched [FreeBSD] cpu:0-Trap cause = 2 (TLB miss (load or instr. fetch) - kernel mode) [ thread pid 0 tid 0 ] Stopped at 0xffffffff80268bdc: lb v0,0(s2) db> bt Tracing pid 0 tid 0 td 0xffffffff808a25f0 ffffffff8067fd98+40 (?,?,?,?) ra ffffffff8013ba24 sp 98000000009305b0 sz 16 ffffffff8013b898+18c (0,?,ffffffff,?) ra ffffffff8013b068 sp 98000000009305c0 sz 48 ffffffff8013abe0+488 (?,?,?,?) ra ffffffff8013b334 sp 98000000009305f0 sz 192 ffffffff8013b240+f4 (?,?,?,?) ra ffffffff8013ed88 sp 98000000009306b0 sz 16 ffffffff8013ebe0+1a8 (?,?,?,?) ra ffffffff8031e180 sp 98000000009306c0 sz 816 ffffffff8031dfe0+1a0 (?,?,?,?) ra ffffffff80696238 sp 98000000009309f0 sz 64 trap+18a0 (?,?,?,?) ra ffffffff8068104c sp 9800000000930a30 sz 288 MipsKernGenException+15c (0,bef000,ffffffff,3) ra ffffffff80268bdc sp 9800000000930b50 sz 368 ffffffff80268b70+6c (?,?,?,?) ra ffffffff80248754 sp 9800000000930cc0 sz 48 ffffffff80248548+20c (?,?,?,?) ra ffffffff80100134 sp 9800000000930cf0 sz 32 ffffffff80100080+b4 (?,?,?,?) ra 0 sp 9800000000930d10 sz 0 pid 0 db> Not sure if that means anything to anyone! Cheers, Joe
Responsible Changed From-To: freebsd-bugs->freebsd-mips Over to maintainer(s).
So the TLB miss problem was fixed by Warner, but since about then the following happens when booting (either from NFS or USB), completely fresh world and src tree, no special make options or optimisations... Kernel config: http://sprunge.us/EVjO Trying to mount root from nfs: []... NFS ROOT: 172.16.8.3:/nfs/bsd/fbsd/erl warning: no time-of-day clock registered, system time will not be set accurately warning: no time-of-day clock registered, system time will not be set accurately start_init: trying /sbin/init Cannot map anonymous memory Out of memory Enter full pathname of shell or RETURN for /bin/sh: Cannot map anonymous memory Out of memory Cannot map anonymous memory Out of memory Enter full pathname of shell or RETURN for /bin/sh: Usual procedure to cross-build from amd64: make buildworld buildkernel KERNCONF=OCTEON-ERL TARGET=mips64 TARGET_ARCH=mips TARGET_CPUTYPE=octeon WITHOUT_MODULES="cxgbe mwlfw mwl ralfw ral runfw run" src.conf just contains NO_FSCHG=
The following set of patches increases the kernel thread stack size to 16K by using a 16K page size for just the kernel stack. Unlike my previous patch set it doesn't require additional wired TLB entries. I have been using this patch set for a few months on my ERL with a NFS mount to 'buildworld' and for port building and have not seen the 'kernel stack overflow' panic. It does add a bit of MIPS64 dependent code in the VM layer. Maybe this should be moved to the pmap layer at some point. The patch set: http://people.freebsd.org/~sson/mips/kstack/kstack_large_page_1.diff http://people.freebsd.org/~sson/mips/kstack/kstack_large_page_2.diff http://people.freebsd.org/~sson/mips/kstack/kstack_large_page_3.diff or one large patch: http://people.freebsd.org/~sson/mips/kstack/kstack_large_page.diff "option KSTACK_LARGE_PAGE" needs to be added to the kernel conf file to enable. -stacey. (sson@)
(In reply to sson from comment #4) Yes, the more appropriate place for the vm_kstack_?alloc seems to be the arch/arch/vm_machdep.c. The non-mips architectures probably should either reference, or include the common implementation. Ideally, patch series would be split into MI part which restructures vm_glue.c and adds MI implementation, used by all architectures. The later changes for MIPS would do whatever is needed in MD.
batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
^Triage: close as Overcome By Events. I'm sorry this PR never got looked at. In the meantime, FreeBSD has dropped MIPS support, so there is nothing to do here.