Bug 116172 - [tun] [nd6] [panic] Network / ipv6 recursive mutex panic
Summary: [tun] [nd6] [panic] Network / ipv6 recursive mutex panic
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 7.0-CURRENT
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2007-09-07 07:50 UTC by peter
Modified: 2022-10-17 12:17 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description peter 2007-09-07 07:50:01 UTC
At reboot, machine panics with:

panic: _mtx_lock_sleep: recursed on non-recursive mutex rtentry @ ../../../net/route.c:197

KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
panic() at panic+0x1e4
_mtx_lock_sleep() at _mtx_lock_sleep+0x112
_mtx_lock_flags() at _mtx_lock_flags+0x7e
rtalloc1() at rtalloc1+0x1fe
nd6_lookup() at nd6_lookup+0x5d
nd6_is_addr_neighbor() at nd6_is_addr_neighbor+0x33
nd6_output() at nd6_output+0x1e9
ip6_output() at ip6_output+0x1206
tcp_output() at tcp_output+0x1151
tcp_usr_disconnect() at tcp_usr_disconnect+0x74
soclose() at soclose+0x359
fdrop() at fdrop+0xdc
closef() at closef+0x1eb
fdfree() at fdfree+0x10d
exit1() at exit1+0x2bc
sys_exit() at sys_exit+0xe
syscall() at syscall+0x1bc
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (1, FreeBSD ELF64, sys_exit), rip = 0x800dd087c, rsp = 0x7fffffffc4e8, rbp = 0 ---

(kgdb) where
#0  doadump () at pcpu.h:194
#1  0xffffffff80294af5 in boot (howto=260) at ../../../kern/kern_shutdown.c:412
#2  0xffffffff80294f22 in panic (fmt=Variable "fmt" is not available.) at ../../../kern/kern_shutdown.c:571
#3  0xffffffff8028a8e2 in _mtx_lock_sleep (m=Variable "m" is not available.) at ../../../kern/kern_mutex.c:310
#4  0xffffffff8028a96e in _mtx_lock_flags (m=Variable "m" is not available.) at ../../../kern/kern_mutex.c:186
#5  0xffffffff8032be5e in rtalloc1 (dst=0xffffffffa49083e0, report=0, ignflags=0) at ../../../net/route.c:197
#6  0xffffffff8036b96d in nd6_lookup (addr6=0xffffff0003f92da8, create=0, ifp=0xffffff0003c82800) at ../../../netinet6/nd6.c:819
#7  0xffffffff8036bc73 in nd6_is_addr_neighbor (addr=0xffffff0003f92da0, ifp=0xffffff0003c82800) at ../../../netinet6/nd6.c:998
#8  0xffffffff8036c189 in nd6_output (ifp=0xffffff0003c82800, origifp=0xffffff0003c82800, m0=0xffffff0003618d00, dst=0xffffff0003f92da0, rt0=0xffffff0003ca15a0) at ../../../netinet6/nd6.c:1960
#9  0xffffffff80369866 in ip6_output (m0=Variable "m0" is not available.) at ../../../netinet6/ip6_output.c:927
#10 0xffffffff803478e1 in tcp_output (tp=0xffffff00072451f0) at ../../../netinet/tcp_output.c:1104
#11 0xffffffff80351544 in tcp_usr_disconnect (so=Variable "so" is not available.) at ../../../netinet/tcp_usrreq.c:576
#12 0xffffffff802e8c69 in soclose (so=0xffffff0007182ae0) at ../../../kern/uipc_socket.c:642
#13 0xffffffff8026cbcc in fdrop (fp=0xffffff00071fc690, td=0xffffff00072ac340) at file.h:297
#14 0xffffffff8026dfcb in closef (fp=0xffffff00071fc690, td=0xffffff00072ac340) at ../../../kern/kern_descrip.c:1983
#15 0xffffffff8026eafd in fdfree (td=0xffffff00072ac340) at ../../../kern/kern_descrip.c:1693
#16 0xffffffff8027786c in exit1 (td=0xffffff00072ac340, rv=65280) at ../../../kern/kern_exit.c:272
#17 0xffffffff8027870e in sys_exit (td=Variable "td" is not available.) at ../../../kern/kern_exit.c:98
#18 0xffffffff80414ebc in syscall (frame=0xffffffffa4908c70) at ../../../amd64/amd64/trap.c:836
#19 0xffffffff803fe1ab in Xfast_syscall () at ../../../amd64/amd64/exception.S:275

There is an active ssh session over ipv6 at the time of the reboot.

The IPv6 routing table is a bit strange, but it does what I need.  There are
overlapping routes of different prefix lengths.

sk0: (default ipv4 gateway, has no active IPv6 activity)
fxp0: inet6 2001:470:1f01:523:1::1 prefixlen 80
tun0: inet6 2001:470:1f01:523:1::1 --> 2001:470:1f01:523::1 prefixlen 128
Note: identical local address on tun0 vs fxp0.


relevant parts of rc.conf:
ipv6_enable="YES"
ipv6_network_interfaces="tun0 fxp0"
ipv6_default_interface="fxp0"
ipv6_ifconfig_fxp0="2001:470:1f01:523:1::1 prefixlen 80"
ipv6_ifconfig_tun0="2001:470:1f01:523:1::1 2001:470:1f01:523::1 prefixlen 128"
ipv6_defaultrouter="2001:470:1f01:523::1"
ipv6_gateway_enable="YES"

start_if.tun0 creates tun0 and runs a custom ipv6 tunnel program.

There is a ssh connection between both ends of the tun0.  ie: from:
2001:470:1f01:523:1::1  to:
2001:470:1f01:523::1

Fix: 

I'm currently unsure what the key trigger is.  I suspect that there
is a route reference count race with killing the tun0 process, killing
the ssh, and routing teardown.

I will figure out an exact recipe to trigger it if the above isn't
enough.  I wanted to document it before 7.0 - I've already forgotten
about it for 2 weeks.
How-To-Repeat: 
Set up overlapping routes with a tunnel.  Open a ssh.  reboot.
This is a 100% reliable panic for me.  Every reboot causes it.
Comment 1 Remko Lodder freebsd_committer freebsd_triage 2007-09-07 09:45:35 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-net

Over to maintainer.
Comment 2 peter 2007-10-19 03:32:25 UTC
I've narrowed down the panic trigger.

I have two userland processes doing a tun0 tunnel.  The purpose is to do 
IPv6 encapsulated in UDP.

The local end has an open ssh connection to the remote end.

If I reboot at that instant, both the ssh and the tunnel driver receive 
a sigterm at the same time.

The death of the tun0 driver causes the tun0 interface to be torn down 
and the routes cleaned up.

At the same time, the death of the ssh process attempts to send a tcp6 
FIN to be sent.  This causes the panic described above.

A simple workaround is to do a 'ssh -4' to the remote end rather than 
over the tunnel.  This avoids the simultanious tun0 route teardown and 
the ssh teardown.  It only seems to be a problem with ssh to the 
precise remote endpoint.  ssh over the tunnel to other machines does 
not cause the panic when the machine is rebooted.

Sample tun0 driver to trigger the panic:  
http://people.freebsd.org/~peter/qd_tun.c   - quick & dirty tunnel :-) 

Ifconfig some ipv6 addresses to each end.  ssh to the other end, 
reboot(8) locally, wait for the kaboom!

-Peter
Comment 3 Bryan Drewery 2008-02-12 15:38:29 UTC
Hi,

I've been getting this (what I believe to be) same panic on my FreeBSD
6.2-p9 machines.
The key difference is that I am NOT using a tunnel. This is *native* ipv6.
Not enabling IPV6 at boot solves the problem for me, but is not a real
solution.

(kgdb) bt
#0  doadump () at pcpu.h:165
#1  0xc056b85e in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#2  0xc056bb68 in panic (fmt=0xc07501cf "%s") at
/usr/src/sys/kern/kern_shutdown.c:565
#3  0xc072406d in trap_fatal (frame=0xe70c9838, eva=0) at
/usr/src/sys/i386/i386/trap.c:837
#4  0xc0723785 in trap (frame=
      {tf_fs = -1067122680, tf_es = -418643928, tf_ds = 40, tf_edi =
-995259392, tf_esi = -947956992, tf_ebp = -418604932, tf_isp =
-418604956, tf_ebx = -928072192, tf_edx = -947956992, tf_ecx = 4, tf_eax
= 4, tf_trapno = 12, tf_err = 0, tf_eip = -1067904549, tf_cs = 32,
tf_eflags = 65539, tf_esp = -947956992, tf_ss = -418604896}) at
/usr/src/sys/i386/i386/trap.c:270
#5  0xc0710b0a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#6  0xc05911db in turnstile_setowner (ts=0xc8aebe00, owner=0x4) at
/usr/src/sys/kern/subr_turnstile.c:432
#7  0xc0591507 in turnstile_wait (lock=0xc4e93ed0, owner=0x4) at
/usr/src/sys/kern/subr_turnstile.c:591
#8  0xc0560167 in _mtx_lock_sleep (m=0xc4e93ed0, tid=3347010304, opts=0,
file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:579
#9  0xc0650912 in nd6_output (ifp=0xc4ad8c00, origifp=0x4,
m0=0xc923c600, dst=0xc4d0c9dc, rt0=0xc4e9f000)
    at /usr/src/sys/netinet6/nd6.c:2004
#10 0xc0649c58 in ip6_output (m0=0xe70c9a78, opt=0x0, ro=0xe70c9a78,
flags=0, im6o=0x0, ifpp=0x0, inp=0xc970a0b4)
    at /usr/src/sys/netinet6/ip6_output.c:994
#11 0xc06267e4 in tcp_output (tp=0xc92d13a0) at
/usr/src/sys/netinet/tcp_output.c:1059
#12 0xc062ed52 in tcp_usr_send (so=0xc93c36f4, flags=0, m=0xc950b700,
nam=0x0, control=0x0, td=0xc77f5300)
    at /usr/src/sys/netinet/tcp_usrreq.c:698
#13 0xc05ad988 in sosend (so=0xc93c36f4, addr=0x0, uio=0xe70c9cb0,
top=0xc950b700, control=0x0, flags=0, td=0xc77f5300)
    at /usr/src/sys/kern/uipc_socket.c:836
#14 0xc05998c8 in soo_write (fp=0x4, uio=0xe70c9cb0,
active_cred=0xc8a77180, flags=0, td=0xc77f5300) at
/usr/src/sys/kern/sys_socket.c:118
#15 0xc0592ff0 in dofilewrite (td=0xc77f5300, fd=4, fp=0xc90d6288,
auio=0xe70c9cb0, offset=Unhandled dwarf expression opcode 0x93
) at file.h:252
#16 0xc0592e27 in kern_writev (td=0xc77f5300, fd=21, auio=0x4) at
/usr/src/sys/kern/sys_generic.c:402
#17 0xc0592cf9 in write (td=0x4, uap=0x4) at
/usr/src/sys/kern/sys_generic.c:326
#18 0xc0724423 in syscall (frame=
      {tf_fs = -1078001605, tf_es = -1078001605, tf_ds = -1078001605,
tf_edi = 137218624, tf_esi = 137218624, tf_ebp = 154, tf_isp =
-418603676, tf_ebx = 270840404, tf_edx = 137166848, tf_ecx = 272127008,
tf_eax = 4, tf_trapno = 22, tf_err = 2, tf_eip = 272063395, tf_cs = 51,
tf_eflags = 530, tf_esp = -1077942612, tf_ss = 59}) at
/usr/src/sys/i386/i386/trap.c:983
#19 0xc0710b5f in Xint0x80_syscall () at
/usr/src/sys/i386/i386/exception.s:200
#20 0x00000033 in ?? ()


I actually did not use rc.conf to setup my ipv6 either, here is my
script for activating the v6 on boot:

ifconfig em0 inet6 2610:88:1::3:2 prefixlen 112
ifconfig em0 inet6 2610:88:1::3:3 prefixlen 112
route add -inet6 default 2610:88:1::3:0

What's odd is that these panics only began when my datacenter had a fire
in their 'router cage', and replaced the router. I'm not sure how this
is related, if at all, but the panics definitely only began on that
date, on two separate machines.

Bryan Drewery
Administrator
Xzibition Data Communications
Comment 4 gregg 2008-02-12 15:47:28 UTC
Have you tried to make the gateway: 2610:88:1::3:1 ?

-Gregg

-----Original Message-----
From: Bryan Drewery [mailto:bryan@xzibition.com]
Sent: Tuesday, February 12, 2008 10:38 AM
To: bug-followup@FreeBSD.org; peter@wemm.org
Subject: Re: kern/116172: [tun] [panic] Network / ipv6 recursive mutex pani=
c

Hi,

I've been getting this (what I believe to be) same panic on my FreeBSD
6.2-p9 machines.
The key difference is that I am NOT using a tunnel. This is *native* ipv6.
Not enabling IPV6 at boot solves the problem for me, but is not a real
solution.

(kgdb) bt
#0  doadump () at pcpu.h:165
#1  0xc056b85e in boot (howto=3D260) at /usr/src/sys/kern/kern_shutdown.c:4=
09
#2  0xc056bb68 in panic (fmt=3D0xc07501cf "%s") at
/usr/src/sys/kern/kern_shutdown.c:565
#3  0xc072406d in trap_fatal (frame=3D0xe70c9838, eva=3D0) at
/usr/src/sys/i386/i386/trap.c:837
#4  0xc0723785 in trap (frame=3D
      {tf_fs =3D -1067122680, tf_es =3D -418643928, tf_ds =3D 40, tf_edi =
=3D
-995259392, tf_esi =3D -947956992, tf_ebp =3D -418604932, tf_isp =3D
-418604956, tf_ebx =3D -928072192, tf_edx =3D -947956992, tf_ecx =3D 4, tf_=
eax
=3D 4, tf_trapno =3D 12, tf_err =3D 0, tf_eip =3D -1067904549, tf_cs =3D 32=
,
tf_eflags =3D 65539, tf_esp =3D -947956992, tf_ss =3D -418604896}) at
/usr/src/sys/i386/i386/trap.c:270
#5  0xc0710b0a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#6  0xc05911db in turnstile_setowner (ts=3D0xc8aebe00, owner=3D0x4) at
/usr/src/sys/kern/subr_turnstile.c:432
#7  0xc0591507 in turnstile_wait (lock=3D0xc4e93ed0, owner=3D0x4) at
/usr/src/sys/kern/subr_turnstile.c:591
#8  0xc0560167 in _mtx_lock_sleep (m=3D0xc4e93ed0, tid=3D3347010304, opts=
=3D0,
file=3D0x0, line=3D0) at /usr/src/sys/kern/kern_mutex.c:579
#9  0xc0650912 in nd6_output (ifp=3D0xc4ad8c00, origifp=3D0x4,
m0=3D0xc923c600, dst=3D0xc4d0c9dc, rt0=3D0xc4e9f000)
    at /usr/src/sys/netinet6/nd6.c:2004
#10 0xc0649c58 in ip6_output (m0=3D0xe70c9a78, opt=3D0x0, ro=3D0xe70c9a78,
flags=3D0, im6o=3D0x0, ifpp=3D0x0, inp=3D0xc970a0b4)
    at /usr/src/sys/netinet6/ip6_output.c:994
#11 0xc06267e4 in tcp_output (tp=3D0xc92d13a0) at
/usr/src/sys/netinet/tcp_output.c:1059
#12 0xc062ed52 in tcp_usr_send (so=3D0xc93c36f4, flags=3D0, m=3D0xc950b700,
nam=3D0x0, control=3D0x0, td=3D0xc77f5300)
    at /usr/src/sys/netinet/tcp_usrreq.c:698
#13 0xc05ad988 in sosend (so=3D0xc93c36f4, addr=3D0x0, uio=3D0xe70c9cb0,
top=3D0xc950b700, control=3D0x0, flags=3D0, td=3D0xc77f5300)
    at /usr/src/sys/kern/uipc_socket.c:836
#14 0xc05998c8 in soo_write (fp=3D0x4, uio=3D0xe70c9cb0,
active_cred=3D0xc8a77180, flags=3D0, td=3D0xc77f5300) at
/usr/src/sys/kern/sys_socket.c:118
#15 0xc0592ff0 in dofilewrite (td=3D0xc77f5300, fd=3D4, fp=3D0xc90d6288,
auio=3D0xe70c9cb0, offset=3DUnhandled dwarf expression opcode 0x93
) at file.h:252
#16 0xc0592e27 in kern_writev (td=3D0xc77f5300, fd=3D21, auio=3D0x4) at
/usr/src/sys/kern/sys_generic.c:402
#17 0xc0592cf9 in write (td=3D0x4, uap=3D0x4) at
/usr/src/sys/kern/sys_generic.c:326
#18 0xc0724423 in syscall (frame=3D
      {tf_fs =3D -1078001605, tf_es =3D -1078001605, tf_ds =3D -1078001605,
tf_edi =3D 137218624, tf_esi =3D 137218624, tf_ebp =3D 154, tf_isp =3D
-418603676, tf_ebx =3D 270840404, tf_edx =3D 137166848, tf_ecx =3D 27212700=
8,
tf_eax =3D 4, tf_trapno =3D 22, tf_err =3D 2, tf_eip =3D 272063395, tf_cs =
=3D 51,
tf_eflags =3D 530, tf_esp =3D -1077942612, tf_ss =3D 59}) at
/usr/src/sys/i386/i386/trap.c:983
#19 0xc0710b5f in Xint0x80_syscall () at
/usr/src/sys/i386/i386/exception.s:200
#20 0x00000033 in ?? ()


I actually did not use rc.conf to setup my ipv6 either, here is my
script for activating the v6 on boot:

ifconfig em0 inet6 2610:88:1::3:2 prefixlen 112
ifconfig em0 inet6 2610:88:1::3:3 prefixlen 112
route add -inet6 default 2610:88:1::3:0

What's odd is that these panics only began when my datacenter had a fire
in their 'router cage', and replaced the router. I'm not sure how this
is related, if at all, but the panics definitely only began on that
date, on two separate machines.

Bryan Drewery
Administrator
Xzibition Data Communications
Comment 5 Rebecca Cran freebsd_committer freebsd_triage 2008-02-25 20:54:01 UTC
Hi,

Some changes were made to sys/net/route.c and sys/netinet6/nd6.c in 
October to fix some routing related panics that were occurring on 
7-CURRENT and 6.3.  I know this fixed an issue where the system would 
panic when the gateway disappeared, but I think it may also have fixed 
the recursive mutex panic too.  Are you still seeing this panic with 
more recent sources (e.g 7.0-RC2)?

Cheers,
Bruce
Comment 6 Remko Lodder freebsd_committer freebsd_triage 2008-02-25 21:07:53 UTC
State Changed
From-To: open->feedback

Bruce asked for feedback 

Comment 7 Gavin Atkinson freebsd_committer freebsd_triage 2008-02-26 10:36:53 UTC
Just to note that kern/113457 (closed as a dupe of this one) shows a second
way of recreating the problem.
Comment 8 Volker Werth freebsd_committer freebsd_triage 2008-05-14 22:10:24 UTC
State Changed
From-To: feedback->suspended


unfortunately we haven't received feedback and the state of this 
issue is still unclear if it has been solved or not by 
updating to a recent version. 
suspend for now until either we're seeing feedback or this 
ticket can be closed within a reasonable timeframe. 


Comment 9 Volker Werth freebsd_committer freebsd_triage 2008-05-14 22:10:24 UTC
Responsible Changed
From-To: freebsd-net->vwe


track
Comment 10 Peter Wemm freebsd_committer freebsd_triage 2008-09-11 10:13:02 UTC
State Changed
From-To: suspended->open

Reopen. Problem not solved, happened again yesterday. 


Comment 11 Peter Wemm freebsd_committer freebsd_triage 2008-09-11 10:13:02 UTC
Responsible Changed
From-To: vwe->bz

Reopen. Problem not solved, happened again yesterday.
Comment 12 Bjoern A. Zeeb freebsd_committer freebsd_triage 2014-05-18 06:03:48 UTC
Responsible Changed
From-To: bz->gnn

I shall not use bugzilla (at least until we will have a CLI).
Comment 13 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 08:00:51 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped
Comment 14 Graham Perrin freebsd_committer freebsd_triage 2022-10-17 12:17:19 UTC
Keyword: 

    crash

– in lieu of summary line prefix: 

    [panic]

* bulk change for the keyword
* summary lines may be edited manually (not in bulk). 

Keyword descriptions and search interface: 

    <https://bugs.freebsd.org/bugzilla/describekeywords.cgi>