Bug 176984

Summary: misc/dahdi-kmod: FreeBSD crashes periodically with RedFone FB2 and dahdi_dynamic_ethmf driver
Product: Ports & Packages Reporter: bsd
Component: Individual Port(s)Assignee: Max Khon <fjoe>
Status: Closed Feedback Timeout    
Severity: Affects Only Me CC: cs, ktrace, w.schwarzenfeld
Priority: Normal    
Version: Latest   
Hardware: Any   
OS: Any   

Description bsd 2013-03-15 08:00:00 UTC
	I using following packages from ports
		dahdi-2.4.0rc5_6
		dahdi-kmod26-2.6.1.r10738
		asterisk18-1.8.20.1
	with DualPort FoneBridge FB2 TDMoE adapter. Kernel crashes periodically (1-2 times/day).
	Before crash system freeze for few seconds, after freeze message like 
<5>Span DYN/ethmf/nfe0/00:50:c2:65:e1:d2/0: Expected seq no 57710, but received 59510 instead
	appears in dmesg and immediately after this kernel crashes.

 	This is info about crash dump:
pbx# cat core.txt.50 
pbx.rfei.ru dumped core - see /var/crash/vmcore.50

Thu Mar 14 17:26:52 MSK 2013

FreeBSD pbx.rfei.ru 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec  4 09:23:10 UTC 2012     root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64

panic: spin lock held too long

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
<6>TDMoX: New master: DYN/ethmf/nfe0/00:50:c2:65:e1:d2/0
<5>Span DYN/ethmf/nfe0/00:50:c2:65:e1:d2/0: Expected seq no 57710, but received 59510 instead
<6>TDMoX: New master: DYN/ethmf/nfe0/00:50:c2:65:e1:d2/0
<5>Span DYN/ethmf/nfe0/00:50:c2:65:e1:d2/1: Expected seq no 57710, but received 59510 instead
spin lock 0xffffffff81652fa0 (ethmf_lock) held by 0xfffffe0002e00470 (tid 100038) too long
panic: spin lock held too long
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff809208a6 at kdb_backtrace+0x66
#1 0xffffffff808ea8be at panic+0x1ce
#2 0xffffffff808d8a9c at _mtx_lock_spin_failed+0x3c
#3 0xffffffff808d8c66 at _mtx_lock_spin+0x1c6
#4 0xffffffff808d9184 at _mtx_lock_spin_flags+0x94
#5 0xffffffff81651351 at rcu_read_lock+0x21
#6 0xffffffff81651663 at ztdethmf_transmit+0x53
#7 0xffffffff8164e35f at __dahdi_dynamic_run+0x1af
#8 0xffffffff816231c0 at _process_masterspan+0x5c0
#9 0xffffffff81623765 at coretimer_func+0x175
#10 0xffffffff808fecf2 at softclock+0x342
#11 0xffffffff808be8d4 at intr_event_execute_handlers+0x104
#12 0xffffffff808c0076 at ithread_loop+0xa6
#13 0xffffffff808bb9ef at fork_exit+0x11f
#14 0xffffffff80bc368e at fork_trampoline+0xe
GEOM_MIRROR: Device main: rebuilding provider ada0 stopped.
Uptime: 8m59s
Dumping 660 out of 4074 MB:..3%..13%..22%..32%..42%..51%..61%..71%..83%..93%

Reading symbols from /boot/kernel/geom_mirror.ko...Reading symbols from /boot/kernel/geom_mirror.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/geom_mirror.ko
Reading symbols from /boot/kernel/nullfs.ko...Reading symbols from /boot/kernel/nullfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/nullfs.ko
Reading symbols from /usr/local/lib/dahdi/dahdi.ko...done.
Loaded symbols for /usr/local/lib/dahdi/dahdi.ko
Reading symbols from /boot/kernel/netgraph.ko...Reading symbols from /boot/kernel/netgraph.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/netgraph.ko
Reading symbols from /usr/local/lib/dahdi/dahdi_dynamic.ko...done.
Loaded symbols for /usr/local/lib/dahdi/dahdi_dynamic.ko
Reading symbols from /usr/local/lib/dahdi/dahdi_dynamic_ethmf.ko...done.
Loaded symbols for /usr/local/lib/dahdi/dahdi_dynamic_ethmf.ko
Reading symbols from /usr/local/lib/dahdi/ng_dahdi_netdev.ko...done.
Loaded symbols for /usr/local/lib/dahdi/ng_dahdi_netdev.ko
Reading symbols from /boot/kernel/ng_ether.ko...Reading symbols from /boot/kernel/ng_ether.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ng_ether.ko
#0  doadump (textdump=Variable "textdump" is not available.
) at pcpu.h:224
224	pcpu.h: No such file or directory.
	in pcpu.h
(kgdb) #0  doadump (textdump=Variable "textdump" is not available.
) at pcpu.h:224
#1  0xffffffff808ea3a1 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:448
#2  0xffffffff808ea897 in panic (fmt=0x1 <Address 0x1 out of bounds>)
    at /usr/src/sys/kern/kern_shutdown.c:636
#3  0xffffffff808d8a9c in _mtx_lock_spin_failed (m=Variable "m" is not available.
)
    at /usr/src/sys/kern/kern_mutex.c:514
#4  0xffffffff808d8c66 in _mtx_lock_spin (m=0xffffffff81652fa0, 
    tid=18446741874729418752, opts=Variable "opts" is not available.
) at /usr/src/sys/kern/kern_mutex.c:556
#5  0xffffffff808d9184 in _mtx_lock_spin_flags (m=0xffffffff81652fa0, opts=0, 
    file=0xffffffff81652698 "/usr/ports/misc/dahdi-kmod26/work/dahdi-freebsd-2.6.1-r10738/bsd-kmod/dahdi_dynamic_ethmf/../../drivers/dahdi/dahdi_dynamic_ethmf.c", line=152) at /usr/src/sys/kern/kern_mutex.c:259
#6  0xffffffff81651351 in rcu_read_lock ()
   from /usr/local/lib/dahdi/dahdi_dynamic_ethmf.ko
#7  0xffffffff81651663 in ztdethmf_transmit ()
   from /usr/local/lib/dahdi/dahdi_dynamic_ethmf.ko
#8  0xffffffff8164e35f in __dahdi_dynamic_run ()
   from /usr/local/lib/dahdi/dahdi_dynamic.ko
#9  0xffffffff8164f3e0 in dahdi_dynamic_receive ()
   from /usr/local/lib/dahdi/dahdi_dynamic.ko
#10 0x00000000000000f8 in ?? ()
#11 0x000000000000001f in ?? ()
#12 0x000000000000001f in ?? ()
#13 0xffffff8000238a30 in ?? ()
#14 0xffffffff816231c0 in _process_masterspan ()
   from /usr/local/lib/dahdi/dahdi.ko
#15 0xfffffe0004fab090 in ?? ()
#16 0xffffff8000238a30 in ?? ()
#17 0x52f63248dae7c326 in ?? ()
#18 0x000000000000021b in ?? ()
#19 0x00000000000003e7 in ?? ()
#20 0x00000000000003e8 in ?? ()
#21 0xffffff80f688f7b0 in ?? ()
#22 0x0000000000083671 in ?? ()
#23 0x0000000000000000 in ?? ()
#24 0xffffff8000238a70 in ?? ()
#25 0xffffffff81623765 in coretimer_func () from /usr/local/lib/dahdi/dahdi.ko
#26 0xffffffff816298e0 in _mod_metadata_md_dahdi_dummy_on_kernel ()
   from /usr/local/lib/dahdi/dahdi.ko
#27 0xffffffff81629900 in _mod_metadata_md_dahdi_on_firmware ()
   from /usr/local/lib/dahdi/dahdi.ko
#28 0xffffffff81629940 in dahdimodule_sys_init ()
   from /usr/local/lib/dahdi/dahdi.ko
#29 0xffffffff81629960 in _mod_metadata_md_dahdi ()
   from /usr/local/lib/dahdi/dahdi.ko
#30 0xffffffff8162a1e0 in ng_dahdi_iface_typestruct ()
   from /usr/local/lib/dahdi/dahdi.ko
#31 0xffffffff8162a220 in ng_dahdi_ifacemodule_sys_init ()
   from /usr/local/lib/dahdi/dahdi.ko
#32 0xffffffff8162a240 in _mod_metadata_md_ng_dahdi_iface ()
   from /usr/local/lib/dahdi/dahdi.ko
#33 0xffffffff81629980 in _mod_metadata_md_dahdi_on_kernel ()
   from /usr/local/lib/dahdi/dahdi.ko
#34 0xffffffff81629a00 in __Tunable_init_10305_sys_init ()
   from /usr/local/lib/dahdi/dahdi.ko
#35 0xffffffff81629a80 in __Tunable_init_10302_sys_init ()
   from /usr/local/lib/dahdi/dahdi.ko
#36 0xffffffff81629b00 in __Tunable_init_10299_sys_init ()
   from /usr/local/lib/dahdi/dahdi.ko
#37 0xffffffff81629b80 in __Tunable_init_10297_sys_init ()
   from /usr/local/lib/dahdi/dahdi.ko
#38 0xffffffff8162a060 in M_DAHDI () from /usr/local/lib/dahdi/dahdi.ko
#39 0xffffffff8162a0c0 in sysctl___dahdi_echocan ()
   from /usr/local/lib/dahdi/dahdi.ko
#40 0xffffffff81629c70 in find_master_work_work_sysinit_sys_init ()
   from /usr/local/lib/dahdi/dahdi.ko
#41 0xffffffff81629cb0 in ecfactory_list_lock_mtx_sysinit_sys_init ()
   from /usr/local/lib/dahdi/dahdi.ko
#42 0xffffffff81629cf0 in registration_mutex_mtx_sysinit_sys_init ()
   from /usr/local/lib/dahdi/dahdi.ko
#43 0xffffffff81629d30 in chan_lock_mtx_sysinit_sys_init ()
   from /usr/local/lib/dahdi/dahdi.ko
#44 0xffffffff81629d70 in zone_lock_mtx_sysinit_sys_init ()
   from /usr/local/lib/dahdi/dahdi.ko
#45 0xffffffff81629db0 in dahdi_timer_lock_mtx_sysinit_sys_init ()
   from /usr/local/lib/dahdi/dahdi.ko
#46 0xffffffff81629df0 in global_dialparamslock_mtx_sysinit_sys_init ()
   from /usr/local/lib/dahdi/dahdi.ko
#47 0xffffffff8162a110 in sysctl___dahdi () from /usr/local/lib/dahdi/dahdi.ko
#48 0xffffffff8162a4a0 in fcstab () from /usr/local/lib/dahdi/dahdi.ko
#49 0x0000000000000000 in ?? ()
#50 0x0000000000000000 in ?? ()
#51 0xffffffff816277f0 in .rodata.str1.1 () from /usr/local/lib/dahdi/dahdi.ko
#52 0xffffffff81629020 in .rodata.str1.8 () from /usr/local/lib/dahdi/dahdi.ko
#53 0xffffffff81627960 in .rodata.str1.1 () from /usr/local/lib/dahdi/dahdi.ko
#54 0xffffffff81623db0 in _dahdi_transmit ()
   from /usr/local/lib/dahdi/dahdi.ko
(kgdb)

Fix: 

Unknown.
How-To-Repeat: 	Install Asterisk, dahdi-2.4 and dahdi-kmod-2.6 from ports and bring up link to PBX via FoneBridge FB2.
	I can give ssh+console access to problem machine, and to another machine that connected over serial port to first.
Comment 1 Edwin Groothuis freebsd_committer freebsd_triage 2013-03-17 15:00:29 UTC
Responsible Changed
From-To: freebsd-ports-bugs->fjoe

Over to maintainer (via the GNATS Auto Assign Tool)
Comment 2 bsd 2013-03-20 09:21:32 UTC
It's highly probably that problem is due to loss of TDMoE packets.
I found workaround: Crashes goes away when connecting FB2 directly to 
server without Ethernet switches.
Comment 3 Carlo Strub freebsd_committer freebsd_triage 2014-09-11 20:13:14 UTC
Is this PR still relevant?
Comment 4 bsd 2014-09-11 20:28:24 UTC
Hello.
I don't sure is this problem still exists in latest FreeBSD. I using 9.1-RELEASE. Without ethernet switches it's working much better, but I got kernel panic twice (19 Aug 2014 and 7 Feb 2014) with following stack trace:

spin lock 0xffffffff816443c0 (dahdi_timer_lock) held by
0xfffffe0098129000 (tid 125869) too long
panic: spin lock held too long
cpuid = 1
KDB: stack backtrace:
#0 0xffffffff809208a6 at kdb_backtrace+0x66
#1 0xffffffff808ea8be at panic+0x1ce
#2 0xffffffff808d8a9c at _mtx_lock_spin_failed+0x3c
#3 0xffffffff808d8c66 at _mtx_lock_spin+0x1c6
#4 0xffffffff808d9184 at _mtx_lock_spin_flags+0x94
#5 0xffffffff8162323a at _process_masterspan+0x63a
#6 0xffffffff816233c1 at _dahdi_receive+0x111
#7 0xffffffff8164f290 at dahdi_dynamic_receive+0x210
#8 0xffffffff816512d5 at ztdethmf_rcv+0x2d5
#9 0xffffffff81653137 at ng_dahdi_netdev_rcvdata+0xa7
#10 0xffffffff81647ca0 at ng_apply_item+0x220
#11 0xffffffff81646efe at ng_snd_item+0x2ce
#12 0xffffffff809a35a7 at ether_demux+0x127
#13 0xffffffff809a38a4 at ether_nh_input+0x1f4
#14 0xffffffff809adafb at netisr_dispatch_src+0x20b
#15 0xffffffff80c0e1e3 at nfe_int_task+0x9e3
#16 0xffffffff8092cf55 at taskqueue_run_locked+0x85
#17 0xffffffff8092ded6 at taskqueue_thread_loop+0x46
Uptime: 128d23h47m25s

I have two kernel core dumps for this problem.
Comment 5 Carlo Strub freebsd_committer freebsd_triage 2014-09-12 04:27:58 UTC
Thanks for your update.
Comment 6 Victor Kustov 2015-02-10 11:28:29 UTC
Confirm same in 9.3-STABLE
Comment 7 Victor Kustov 2015-04-08 15:15:24 UTC
spin lock 0xc5d52bb0 (dahdi_timer_lock) held by 0xc5s5d000 (tid 100146) too long
panic: spin lock held too long
cpuid = 0
KDB: stack backtrace;
#0 0xc0b1ef6f at kdb_backtrace+0x4f
#1 0xc0ae5b8f at panic+0x16f
#2 0xc0ad048f at _mtx_lock_spin_failed+0x3f
#3 0xc0ad0f05 at _mtx_lock_spin+0x165
#4 0xc0ad1721 at _mtx_lock_spin_flags+0x71
#5 0xc5d3165a at _process_masterspan+0x62a
#6 0xc5d317c2 at _dahdi_receive+0x112
#7 0xc5d4e97c at dahdi_dynamic_receive+0x21c
#8 0xc5c69190 at zldeth_rcv+0x60
#9 0xc5c6cf43 at ng_dahdi_netdev_rcvdata+0xb3
#10 0xc174c911 at ng_apply_item+0x1f1
#11 0xc174bb4f at ng_snd_item+0x28f
#12 0xc0babc4f at ether_demux+0x20f
#13 0xc0bac88f at ether_nb_input+0x38f
#14 0xc0bb51af at netisr_dispatch_src+0x0f
#15 0xc0bb5450 at netisr_dispatch+0x20
#16 0xc0bab6f9 at ether_input+0x19
#17 0xc0cdf8e2 at rl_rxeof+0x252

dahdi-2.4.0rc5_6               DAHDI userland utilities and libraries
dahdi-kmod26-2.6.1.r10747_1    Digium/Asterisk Hardware Device Interface

FreeBSD 9.3-STABLE #15 r275474
Comment 8 Victor Kustov 2015-04-10 17:52:18 UTC
Once more traps.

 dahdi_netdev(dahdi@rl0): new netgraph node

 Fatal trap 12: page fault while in kernel mode
 dahdi_netdev(dahdi@rl0): ether 00:04:61:9f
 cpuid = 0; apic id = 00
 fault virtual address   = 0x0
 fault code              = supervisor read, page not present
 instruction pointer     = 0x20:0xc5ef1bea
 stack pointer           = 0x28:0xe82bf6b8
 frame pointer           = 0x28:0xe82bf910
 code segment            = base rx0, limit 0xfffff, type 0x1b
 = DPL 0, pres 1, def32 1, gran 1
 processor eflags        = interrupt enabled, resume, IOPL = 0
 current process         = 7586 (dahdi_cfg)
 trap number             = 12
 panic: page fault
 cpuid = 0
 KDB: stack backtrace:
 #0 0xc0b209ef at kdb_backtrace+0x4f
 #1 0xc0ae771f at panic+0x16f
 #2 0xc0fb0df3 at trap_fatal+0x323
 #3 0xc0fb0ef8 at trap_pfault+0xf8
 #4 0xc0fb22bb at trap+0x59b
 #5 0xc0f9adbc at calltrap+0x6
 #6 0xc5e9223b at dahdi_dynamic_ioctl+0x4eb
 #7 0xc60da222 at dahdi_unlocked_ioctl+0x162
 #8 0xc60cfabb at dahdi_device_ioctl+0x3b
 #9 0xc09bd53a at devfs_ioctl_f+0x10a
 #10 0xc0b33630 at kern_ioctl+0x2a0
 #11 0xc0b337e8 at sys_ioctl+0x178
 #12 0xc0fb16e3 at syscall+0x443
 #13 0xc0f9ae51 at Xint0x80_syscall+0x21
Comment 9 Walter Schwarzenfeld freebsd_triage 2018-01-12 01:52:17 UTC
 9.1-RELEASE is EOL. Is this still relevant?
Comment 10 Max Khon freebsd_committer freebsd_triage 2018-01-12 03:01:05 UTC
I think it is as it is not specific to the FreeBSD release.
Comment 11 Mark Linimon freebsd_committer freebsd_triage 2020-07-25 23:36:02 UTC
^Triage: close as OBE.  To submitter: please let us know if we have closed this in error.