Bug 71073

Summary: [vlan] a media ioctl to vlan(4) causes panic if parent is bge(4)
Product: Base System Reporter: Yar Tikhiy <yar>
Component: kernAssignee: Andre Oppermann <andre>
Status: Closed FIXED    
Severity: Affects Only Me CC: andre
Priority: Normal    
Version: 6.0-CURRENT   
Hardware: Any   
OS: Any   

Description Yar Tikhiy 2004-08-28 12:40:16 UTC
	Upon receiving a SIOCGIFMEDIA ioctl, vlan(4) will grab
	the vlan mutex and call the bge(4) parent's if_ioctl handler.
	The sequence of calls will end up in vlan_link_state(), which
	tries to obtain the vlan mutex again and so panics the system.
	The stack trace is as follows:

#24 0xc049de23 in panic (
    fmt=0xc05d4b68 "_mtx_lock_sleep: recursed on non-recursive mutex %s @ %s:%d\n") at /usr/src/sys/kern/kern_shutdown.c:542
#25 0xc0496232 in _mtx_lock_sleep (m=0xc064f500, td=0xc0d889a0, opts=0,
    file=0xc05dead1 "/usr/src/sys/net/if_vlan.c", line=819)
    at /usr/src/sys/kern/kern_mutex.c:436
#26 0xc0495f50 in _mtx_lock_flags (m=0xc064f500, opts=0,
    file=0xc05dead1 "/usr/src/sys/net/if_vlan.c", line=819)
    at /usr/src/sys/kern/kern_mutex.c:253
#27 0xc0502657 in vlan_link_state (ifp=0xc0e5e000, link=1)
    at /usr/src/sys/net/if_vlan.c:819
#28 0xc04500d1 in miibus_linkchg (dev=0xc0df8000)
    at /usr/src/sys/dev/mii/mii.c:270
#29 0xc04508f1 in mii_phy_update (sc=0xc0e7a580, cmd=3) at miibus_if.h:62
#30 0xc044d38b in brgphy_service (sc=0xc0e7a580, mii=0xc0e76940, cmd=3)
    at /usr/src/sys/dev/mii/brgphy.c:391
#31 0xc04502cf in mii_pollstat (mii=0xc0e76940)
    at /usr/src/sys/dev/mii/mii.c:384
#32 0xc0445583 in bge_ifmedia_sts (ifp=0x0, ifmr=0xc5836c60)
    at /usr/src/sys/dev/bge/if_bge.c:3411
#33 0xc0501553 in ifmedia_ioctl (ifp=0xc0e5e000, ifr=0x0, ifm=0xc0e76940,
    cmd=0) at /usr/src/sys/net/if_media.c:281
#34 0xc04457bb in bge_ioctl (ifp=0xc0e5e000, command=3223873848, data=0x0)
    at /usr/src/sys/dev/bge/if_bge.c:3489
#35 0xc05027d3 in vlan_ioctl (ifp=0xc0f04000, cmd=0, data=0xc5836c60 "vlan99")
    at /usr/src/sys/net/if_vlan.c:872
#36 0xc04fd348 in ifhwioctl (cmd=3223873848, ifp=0xc0f04000,
    data=0xc5836c60 "vlan99", td=0x0) at /usr/src/sys/net/if.c:1288
#37 0xc04fd44f in ifioctl (so=0xc0f0eb64, cmd=3223873848,
    data=0xc5836c60 "vlan99", td=0xc0d889a0) at /usr/src/sys/net/if.c:1341
#38 0xc04c39b1 in soo_ioctl (fp=0x0, cmd=0, data=0xc5836c60,
    active_cred=0xc0d7be00, td=0x0) at /usr/src/sys/kern/sys_socket.c:202
#39 0xc04be444 in ioctl (td=0xc0d889a0, uap=0xc5836d14) at file.h:258

	I found that using fxp(4) instead of bge(4) didn't result
	in system panic.

Fix: 

Unfortunately, I don't understand yet which code is actually
	incorrect, bge's or vlan's.  I hope andre@ (CC'd) could shed
	light on the issue.  Andre, would you mind dropping a word?
	That seems to be your code ;-)
How-To-Repeat: 	Just boot a system with vlan interfaces set up in rc.conf
	to attach to a bge.  The system will panic as soon as the
	rc script `netif' will try to display post-configuration
	status of the interfaces.
Comment 1 Andre Oppermann freebsd_committer freebsd_triage 2004-09-11 00:52:18 UTC
Responsible Changed
From-To: freebsd-bugs->andre

Take over.
Comment 2 Andre Oppermann freebsd_committer freebsd_triage 2004-09-16 21:41:06 UTC
State Changed
From-To: open->analyzed

Bill Paul and me are working on this.  A possible fix is in the 
works.
Comment 3 Andre Oppermann freebsd_committer freebsd_triage 2005-09-14 16:40:03 UTC
State Changed
From-To: analyzed->closed

The problem has been fixed in sys/net/if_vlan.c rev. 1.69.
Comment 4 Yar Tikhiy 2005-09-14 23:14:20 UTC
The bad news are that if_vlan.c rev. 1.69 hardly could fix this
problem.  First, I saw this problem *after* rev. 1.69 had been
committed.  Second, this problem had to do with vlan-bge interoperation
over media control and link state, not with multicast stuff.

The good news are that I've just heard of vlan(4) working OK over
bge(4) in RELENG_6.  Alas, I cannot confirm that myself right now
because my test machine with bge(4) has been dismantled.  I hope
I'll be able to stick my bge(4) in another spare machine RSN.

Perhaps we should put this PR in a state like "feedback" for now,
eh?

-- 
Yar
Comment 5 Andre Oppermann freebsd_committer freebsd_triage 2005-10-11 16:59:11 UTC
Yar Tikhiy wrote:
> 
> The bad news are that if_vlan.c rev. 1.69 hardly could fix this
> problem.  First, I saw this problem *after* rev. 1.69 had been
> committed.  Second, this problem had to do with vlan-bge interoperation
> over media control and link state, not with multicast stuff.
> 
> The good news are that I've just heard of vlan(4) working OK over
> bge(4) in RELENG_6.  Alas, I cannot confirm that myself right now
> because my test machine with bge(4) has been dismantled.  I hope
> I'll be able to stick my bge(4) in another spare machine RSN.
> 
> Perhaps we should put this PR in a state like "feedback" for now,
> eh?

I've got two bge interfaces in my main FreeBSD development box and
they are just fine including vlan and link state changes.  If not
the change I've singled out, then some other must have fixed it.
However that doesn't change the overvation that it reliably works
now.  Which is good.  :-)

-- 
Andre
Comment 6 Yar Tikhiy 2005-10-11 17:43:50 UTC
On Tue, Oct 11, 2005 at 05:59:11PM +0200, Andre Oppermann wrote:
> > 
> > The bad news are that if_vlan.c rev. 1.69 hardly could fix this
> > problem.  First, I saw this problem *after* rev. 1.69 had been
> > committed.  Second, this problem had to do with vlan-bge interoperation
> > over media control and link state, not with multicast stuff.
> > 
> > The good news are that I've just heard of vlan(4) working OK over
> > bge(4) in RELENG_6.  Alas, I cannot confirm that myself right now
> > because my test machine with bge(4) has been dismantled.  I hope
> > I'll be able to stick my bge(4) in another spare machine RSN.
> > 
> > Perhaps we should put this PR in a state like "feedback" for now,
> > eh?
> 
> I've got two bge interfaces in my main FreeBSD development box and
> they are just fine including vlan and link state changes.  If not
> the change I've singled out, then some other must have fixed it.
> However that doesn't change the overvation that it reliably works
> now.  Which is good.  :-)

It's great news since I'm still in the hope of putting my bge back
to work and development (I just ought to, I got it as a donation :-)
Thanks a lot!

-- 
Yar