Bug 193277 - Kernel crash after a few minutes/hours when aplications want to write something to disk.
Summary: Kernel crash after a few minutes/hours when aplications want to write somethi...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.0-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-03 09:46 UTC by maciej.gabryszak
Modified: 2015-06-23 18:38 UTC (History)
0 users

See Also:


Attachments
core.txt.0 (85.14 KB, text/plain)
2014-09-03 09:46 UTC, maciej.gabryszak
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description maciej.gabryszak 2014-09-03 09:46:58 UTC
Created attachment 146726 [details]
core.txt.0

# Two servers have disks:
2x ST3000DM001-9YN166 CC4B
+ PF
+ CARP
+ sometimes bhyve to test
+ vlans
+ bridge interface
+ tap interfaces for bhyve
+ I tested with ZFS and UFS. Now it working at UFS file system + Gmirror


# Two another servers have (bigger servers):
4x INTEL SSDSC2BA100G3 5DV10265 (SSD)
7x ATA HGST HUS724040AL AA70    (SATA)
+ without PF
+ CARP
+ bhyve
+ vlans
+ bridge interface
+ tap interfaces for bhyve
+ ZFS

I'm testing bhyve but servers without bhyve crash too. 

Servers Supermicro.





core.txt.0

Fri Aug 29 19:59:54 CEST 2014

FreeBSD rack02.xx 10.0-RELEASE FreeBSD 10.0-RELEASE #0 r260789: Thu Jan 16 22:34:59 UTC 2014     root@snap.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

panic: spin lock held too long

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
<118>Stopping powerd.
<118>Waiting for PIDS: 2913.
<118>Stopping devd.
<118>Waiting for PIDS: 2578.
<118>Writing entropy file:.
<118>.
<118>Terminated
<118>Aug 29 19:54:04 rack02 syslogd: exiting on signal 15
spin lock 0xffffffff814fa030 (smp rendezvous) held by 0xfffff8035506d000 (tid 100157) too long
panic: spin lock held too long
cpuid = 2
KDB: stack backtrace:
#0 0xffffffff808e7dd0 at kdb_backtrace+0x60
#1 0xffffffff808af8b5 at panic+0x155
#2 0xffffffff8089cb71 at _mtx_lock_spin_cookie+0x241
#3 0xffffffff80c7ef54 at smp_targeted_tlb_shootdown+0xf4
#4 0xffffffff80c80922 at pmap_invalidate_all+0x232
#5 0xffffffff80c8736f at pmap_remove_pages+0x6af
#6 0xffffffff80b104c0 at vmspace_exit+0xa0
#7 0xffffffff8087c6af at exit1+0x65f
#8 0xffffffff808b2eef at sigexit+0xb7f
#9 0xffffffff808b35a9 at postsig+0x349
#10 0xffffffff808f6e97 at ast+0x437
#11 0xffffffff80c765c9 at doreti_ast+0x1f
Uptime: 1d2h40m52s
(ada0:ahcich0:0:0:0): STANDBY_IMMEDIATE. ACB: e0 00 00 00 00 40 00 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: CCB request is in progress
(ada0:ahcich0:0:0:0): Error 5, Retries exhausted
(ada0:ahcich0:0:0:0): Spin-down disk failed
(ada1:ahcich1:0:0:0): STANDBY_IMMEDIATE. ACB: e0 00 00 00 00 40 00 00 00 00 00 00
(ada1:ahcich1:0:0:0): CAM status: CCB request is in progress
(ada1:ahcich1:0:0:0): Error 5, Retries exhausted
(ada1:ahcich1:0:0:0): Spin-down disk failed
Dumping 1590 out of 32706 MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/zfs.ko.symbols...done.
Loaded symbols for /boot/kernel/zfs.ko.symbols
Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
Loaded symbols for /boot/kernel/opensolaris.ko.symbols
Reading symbols from /boot/kernel/if_tap.ko.symbols...done.
Loaded symbols for /boot/kernel/if_tap.ko.symbols
Reading symbols from /boot/kernel/pf.ko.symbols...done.
Loaded symbols for /boot/kernel/pf.ko.symbols
Reading symbols from /boot/kernel/bridgestp.ko.symbols...done.
Loaded symbols for /boot/kernel/bridgestp.ko.symbols
Reading symbols from /boot/kernel/carp.ko.symbols...done.
Loaded symbols for /boot/kernel/carp.ko.symbols
Reading symbols from /boot/kernel/if_bridge.ko.symbols...done.
Loaded symbols for /boot/kernel/if_bridge.ko.symbols
Reading symbols from /boot/kernel/if_em.ko.symbols...done.
Loaded symbols for /boot/kernel/if_em.ko.symbols
Reading symbols from /boot/kernel/ipmi.ko.symbols...done.
Loaded symbols for /boot/kernel/ipmi.ko.symbols
Reading symbols from /boot/kernel/smbus.ko.symbols...done.
Loaded symbols for /boot/kernel/smbus.ko.symbols
Reading symbols from /boot/kernel/vmm.ko.symbols...done.
Loaded symbols for /boot/kernel/vmm.ko.symbols
Reading symbols from /boot/kernel/nmdm.ko.symbols...done.
Loaded symbols for /boot/kernel/nmdm.ko.symbols
Reading symbols from /boot/kernel/pfsync.ko.symbols...done.
Loaded symbols for /boot/kernel/pfsync.ko.symbols
Reading symbols from /boot/kernel/pflog.ko.symbols...done.
Loaded symbols for /boot/kernel/pflog.ko.symbols
Reading symbols from /boot/kernel/if_lagg.ko.symbols...done.
Loaded symbols for /boot/kernel/if_lagg.ko.symbols
Reading symbols from /boot/kernel/ums.ko.symbols...done.
Loaded symbols for /boot/kernel/ums.ko.symbols
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
219	pcpu.h: No such file or directory.
	in pcpu.h
(kgdb) #0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff808af530 in kern_reboot (howto=16644)
    at /usr/src/sys/kern/kern_shutdown.c:447
#2  0xffffffff808af8f4 in panic (fmt=<value optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:754
#3  0xffffffff8089cb71 in _mtx_lock_spin_cookie (c=<value optimized out>, 
    tid=<value optimized out>, opts=<value optimized out>, 
    file=<value optimized out>, line=<value optimized out>)
    at /usr/src/sys/kern/kern_mutex.c:554
#4  0xffffffff80c7ef54 in smp_targeted_tlb_shootdown (mask={__bits = {105}}, 
    vector=244, pmap=0x0, addr1=0, addr2=0)
    at /usr/src/sys/amd64/amd64/mp_machdep.c:1179
#5  0xffffffff80c80922 in pmap_invalidate_all (pmap=<value optimized out>)
    at /usr/src/sys/amd64/amd64/pmap.c:1532
#6  0xffffffff80c8736f in pmap_remove_pages (pmap=0xfffff80009388138)
    at /usr/src/sys/amd64/amd64/pmap.c:5302
#7  0xffffffff80b104c0 in vmspace_exit (td=0xfffff80837ede920)
    at /usr/src/sys/vm/vm_map.c:399
#8  0xffffffff8087c6af in exit1 (td=0xfffff80837ede920, 
    rv=<value optimized out>) at /usr/src/sys/kern/kern_exit.c:321
#9  0xffffffff808b2eef in sigexit (td=<value optimized out>, 
    sig=<value optimized out>) at /usr/src/sys/kern/kern_sig.c:2935
#10 0xffffffff808b35a9 in postsig (sig=<value optimized out>)
    at /usr/src/sys/kern/kern_sig.c:2822
#11 0xffffffff808f6e97 in ast (framep=<value optimized out>)
    at /usr/src/sys/kern/subr_trap.c:271
#12 0xffffffff80c765c9 in doreti_ast ()
    at /usr/src/sys/amd64/amd64/exception.S:677
#13 0x00000000ffffffff in ?? ()
#14 0x00007fffffffd674 in ?? ()
#15 0x0000000000000000 in ?? ()
Current language:  auto; currently minimal
(kgdb)
Comment 1 maciej.gabryszak 2014-09-03 09:53:56 UTC
I tried turn off powerd but it didn't change anything.
Comment 2 Andriy Gapon freebsd_committer freebsd_triage 2014-09-03 13:17:16 UTC
Pls execute the following commands in kgdb:
tid 100157
bt
Comment 3 maciej.gabryszak 2014-09-03 17:58:05 UTC
(kgdb) tid 100157
[Switching to thread 91 (Thread 100157)]#0  0xffffffff80c7f478 in cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1432
1432    /usr/src/sys/amd64/amd64/mp_machdep.c: No such file or directory.
        in /usr/src/sys/amd64/amd64/mp_machdep.c
Current language:  auto; currently minimal
Comment 4 maciej.gabryszak 2014-09-03 17:58:52 UTC
(kgdb) bt
#0  0xffffffff80c7f478 in cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1432
#1  0xffffffff80c7f43f in ipi_nmi_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1417
#2  0xffffffff80c8db52 in trap (frame=0xfffffe083a761f30) at /usr/src/sys/amd64/amd64/trap.c:211
#3  0xffffffff80c757d3 in nmi_calltrap () at /usr/src/sys/amd64/amd64/exception.S:505
#4  0xffffffff80c7f0f9 in smp_targeted_tlb_shootdown (mask={__bits = {0}}, vector=<value optimized out>, pmap=<value optimized out>, addr1=<value optimized out>, 
    addr2=0) at /usr/src/sys/amd64/amd64/mp_machdep.c:1204
#5  0xffffffff80c80922 in pmap_invalidate_all (pmap=<value optimized out>) at /usr/src/sys/amd64/amd64/pmap.c:1532
#6  0xffffffff80c815de in pmap_release (pmap=0xfffff800196ed838) at /usr/src/sys/amd64/amd64/pmap.c:2511
#7  0xffffffff80b10532 in vmspace_exit (td=<value optimized out>) at /usr/src/sys/vm/vm_map.c:330
#8  0xffffffff8087c6af in exit1 (td=0xfffff8035506d000, rv=<value optimized out>) at /usr/src/sys/kern/kern_exit.c:321
#9  0xffffffff8087c04e in sys_sys_exit (td=<value optimized out>, uap=<value optimized out>) at /usr/src/sys/kern/kern_exit.c:121
#10 0xffffffff80c8ef87 in amd64_syscall (td=0xfffff8035506d000, traced=0) at subr_syscall.c:134
#11 0xffffffff80c7567b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:391
#12 0x0000000800db031a in ?? ()
Previous frame inner to this frame (corrupt stack?)
Comment 5 maciej.gabryszak 2014-09-03 21:31:10 UTC
I reinstall one server to FreeBSD 10.1-PRERELEASE.
Server crashed after 10 minute without core file.
It was when i transfer data between two another servers. Here is PF with forwarding options.
Comment 6 maciej.gabryszak 2014-09-04 18:17:23 UTC
I tested forwarding by PF. When i send backup by rsync using server as firewall than it crash after 1-2 seconds.

rdr pass on VLAN300 proto tcp from x.x.x.x to x.x.x.x port number_port -> 172.x.0.x port 22