Bug 153594 - [wlan] netif/devd race
Summary: [wlan] netif/devd race
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: wireless (show other bugs)
Version: 8.2-PRERELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-wireless (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-01-01 01:10 UTC by Raphael Kubo da Costa
Modified: 2019-01-27 11:19 UTC (History)
2 users (show)

See Also:


Attachments
devd-80211.diff (507 bytes, patch)
2011-01-17 20:27 UTC, Bernhard Schmidt
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Raphael Kubo da Costa 2011-01-01 01:10:12 UTC
My wireless network card is an Intel PRO/Wireless 5100, and I'm using the iwn driver.

/etc/rc.conf contains the following:
  wlans_iwn0="wlan0"
  ifconfig_wlan0="WPA SYNCDHCP"

And /etc/wpa_supplicat.conf has the appropriate settings for some access points.

When the system boots, the network is established correctly, but whenever I need to restart it via '/etc/rc.d/netif restart', when I ping my access point around 10 packets are sent before the network goes down and 'ifconfig wlan0' shows it is looking for different APs (or even the same AP in diverse channels, for example). When a connection is established to the AP again, it goes down after a few seconds again.

If I do '/etc/rc.d/netif restart' again, the connection stops dropping.

How-To-Repeat: /etc/rc.d/netif restart
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2011-01-03 19:10:15 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-net

Over to maintainer(s).
Comment 2 Bernhard Schmidt freebsd_committer freebsd_triage 2011-01-03 19:59:38 UTC
State Changed
From-To: open->suspended

This is known issue. There is race in devd and our rc-subsystem if wpa_supplicant is involved effectivly resulting in starting wpa_supplicant twice. Both instances try to take over the wlan device which results in what you are seeing. 

I have no idea how to fix this right now, so this has to wait until I'm able to think of proper fix. 

As a workaround, don't use netif restart but kldunload if_iwn; kldload if_iwn instead. 


Comment 3 Bernhard Schmidt freebsd_committer freebsd_triage 2011-01-03 19:59:38 UTC
Responsible Changed
From-To: freebsd-net->bschmidt

over to me
Comment 4 Eugene Grosbein 2011-01-04 08:08:24 UTC
> There is race in devd and our rc-subsystem if wpa_supplicant is involved 
> effectivly resulting in starting wpa_supplicant twice. Both instances try
> to take over the wlan device which results in what you are seeing.
> I have no idea how to fix this right now, so this has to wait until I'm able
> to think of proper fix.

Perhaps, wrapping wpa_supplicant invocation into "lockf -t0" would help
to eliminate race?

Eugene Grosbein
Comment 5 Bernhard Schmidt freebsd_committer freebsd_triage 2011-01-04 09:06:05 UTC
On Tuesday, January 04, 2011 09:08:24 Eugene Grosbein wrote:
> > There is race in devd and our rc-subsystem if wpa_supplicant is involved
> > effectivly resulting in starting wpa_supplicant twice. Both instances try
> > to take over the wlan device which results in what you are seeing.
> > I have no idea how to fix this right now, so this has to wait until I'm
> > able to think of proper fix.
> 
> Perhaps, wrapping wpa_supplicant invocation into "lockf -t0" would help
> to eliminate race?

Possibly, but I don't think this is the way to go.

Currently wpa_supplicant has this code:
        /*
         * Mark the interface as down to ensure wpa_supplicant has exclusive
         * access to the net80211 state machine, do this before opening the
         * route socket to avoid a false event that the interface disappeared.
         */
        if (getifflags(drv, &flags) == 0)
                (void) setifflags(drv, flags &~ IFF_UP);

This code works such that it will send an event to already running 
wpa_supplicant instances which will then terminate. This does indeed work if 
there's enough delay between invocations, though, if there is just a small 
delay (~100ms or something), that event doesn't get passed probably. I think 
we should start looking into possible solution at that point, trying to figure 
out why the the event doesn't get passed (probably because the interface is 
not yet up at that point) will get us closer to proper solution.

-- 
Bernhard
Comment 6 Eugene Grosbein 2011-01-04 09:09:15 UTC
On 04.01.2011 15:06, Bernhard Schmidt wrote:

>> Perhaps, wrapping wpa_supplicant invocation into "lockf -t0" would help
>> to eliminate race?
> 
> Possibly, but I don't think this is the way to go.
> 
> Currently wpa_supplicant has this code:
>         /*
>          * Mark the interface as down to ensure wpa_supplicant has exclusive
>          * access to the net80211 state machine, do this before opening the
>          * route socket to avoid a false event that the interface disappeared.
>          */
>         if (getifflags(drv, &flags) == 0)
>                 (void) setifflags(drv, flags &~ IFF_UP);
> 
> This code works such that it will send an event to already running 
> wpa_supplicant instances which will then terminate. This does indeed work if 
> there's enough delay between invocations, though, if there is just a small 
> delay (~100ms or something), that event doesn't get passed probably. I think 
> we should start looking into possible solution at that point, trying to figure 
> out why the the event doesn't get passed (probably because the interface is 
> not yet up at that point) will get us closer to proper solution.

Proper fine-grained locking was always good solution for race problem :-)
How about using flock(2) in wpa_supplicant source code?

Eugene Grosbein
Comment 7 Bernhard Schmidt freebsd_committer freebsd_triage 2011-01-04 09:39:47 UTC
On Tuesday, January 04, 2011 10:09:15 Eugene Grosbein wrote:
> On 04.01.2011 15:06, Bernhard Schmidt wrote:
> >> Perhaps, wrapping wpa_supplicant invocation into "lockf -t0" would help
> >> to eliminate race?
> > 
> > Possibly, but I don't think this is the way to go.
> > 
> > Currently wpa_supplicant has this code:
> >         /*
> >         
> >          * Mark the interface as down to ensure wpa_supplicant has
> >          exclusive * access to the net80211 state machine, do this
> >          before opening the * route socket to avoid a false event that
> >          the interface disappeared. */
> >         
> >         if (getifflags(drv, &flags) == 0)
> >         
> >                 (void) setifflags(drv, flags &~ IFF_UP);
> > 
> > This code works such that it will send an event to already running
> > wpa_supplicant instances which will then terminate. This does indeed work
> > if there's enough delay between invocations, though, if there is just a
> > small delay (~100ms or something), that event doesn't get passed
> > probably. I think we should start looking into possible solution at that
> > point, trying to figure out why the the event doesn't get passed
> > (probably because the interface is not yet up at that point) will get us
> > closer to proper solution.
> 
> Proper fine-grained locking was always good solution for race problem :-)
> How about using flock(2) in wpa_supplicant source code?

I don't see any flock'able resource shared between instances, do you?

-- 
Bernhard
Comment 8 Eugene Grosbein 2011-01-04 09:44:36 UTC
On 04.01.2011 15:39, Bernhard Schmidt wrote:

>> Proper fine-grained locking was always good solution for race problem :-)
>> How about using flock(2) in wpa_supplicant source code?
> 
> I don't see any flock'able resource shared between instances, do you?

Just use pidfile(3) :-)
Comment 9 Bernhard Schmidt freebsd_committer freebsd_triage 2011-01-17 20:27:36 UTC
Hi,

can you give attached patch a shot? Just apply it to /etc/devd.conf and 
restart devd. This should fix the issue with netif restart.

Thanks.

-- 
Bernhard
Comment 10 Bernhard Schmidt freebsd_committer freebsd_triage 2011-01-18 10:23:32 UTC
State Changed
From-To: suspended->feedback

feedback requested
Comment 11 Raphael Kubo da Costa 2011-01-19 00:41:32 UTC
On 01/17/2011 18:27, Bernhard Schmidt wrote:
> Hi,
>
> can you give attached patch a shot? Just apply it to /etc/devd.conf and
> restart devd. This should fix the issue with netif restart.
>
> Thanks.

Hi,

I applied the patch, then stopped devd and netif (in this order). After 
that, I started devd and netif (in this order).

I did not lose packets when pinging a remote host, nor did I lose any 
after ~2 netif restarts. In the third time, I started losing more 
packets than before, and the problem persisted after another restart.

I then stopped devd again, then stopped netif again, started both again 
and the problem disappeared. So it seems not to have completely vanished.

Should I revert the patch?
Comment 12 Bernhard Schmidt freebsd_committer freebsd_triage 2011-01-19 07:14:32 UTC
On Wednesday, January 19, 2011 01:41:32 Raphael Kubo da Costa wrote:
> On 01/17/2011 18:27, Bernhard Schmidt wrote:
> > Hi,
> > 
> > can you give attached patch a shot? Just apply it to /etc/devd.conf
> > and restart devd. This should fix the issue with netif restart.
> > 
> > Thanks.
> 
> Hi,
> 
> I applied the patch, then stopped devd and netif (in this order).
> After that, I started devd and netif (in this order).
> 
> I did not lose packets when pinging a remote host, nor did I lose any
> after ~2 netif restarts. In the third time, I started losing more
> packets than before, and the problem persisted after another restart.
> 
> I then stopped devd again, then stopped netif again, started both
> again and the problem disappeared. So it seems not to have
> completely vanished.
> 
> Should I revert the patch?

While the 'packet loss' occurs, can you do a 'ps xauw | grep wpa'? if 
there aren't 2 instances of wpa_supplicant running, that's a new issue.

-- 
Bernhard
Comment 13 Raphael Kubo da Costa 2011-01-20 00:40:22 UTC
On 01/19/2011 05:14, Bernhard Schmidt wrote:
> On Wednesday, January 19, 2011 01:41:32 Raphael Kubo da Costa wrote:
>> On 01/17/2011 18:27, Bernhard Schmidt wrote:
>>> Hi,
>>>
>>> can you give attached patch a shot? Just apply it to /etc/devd.conf
>>> and restart devd. This should fix the issue with netif restart.
>>>
>>> Thanks.
>>
>> Hi,
>>
>> I applied the patch, then stopped devd and netif (in this order).
>> After that, I started devd and netif (in this order).
>>
>> I did not lose packets when pinging a remote host, nor did I lose any
>> after ~2 netif restarts. In the third time, I started losing more
>> packets than before, and the problem persisted after another restart.
>>
>> I then stopped devd again, then stopped netif again, started both
>> again and the problem disappeared. So it seems not to have
>> completely vanished.
>>
>> Should I revert the patch?
>
> While the 'packet loss' occurs, can you do a 'ps xauw | grep wpa'? if
> there aren't 2 instances of wpa_supplicant running, that's a new issue.

Indeed, there are 2 wpa_supplicant instances running when the packet 
losses occur. If I stop both devd and netif and start netif, I get one 
single wpa_supplicant instance and no packet loss.
Comment 14 Bernhard Schmidt freebsd_committer freebsd_triage 2012-06-19 07:56:47 UTC
State Changed
From-To: feedback->open

feedback received 


Comment 15 Bernhard Schmidt freebsd_committer freebsd_triage 2012-06-19 07:56:47 UTC
Responsible Changed
From-To: bschmidt->freebsd-wireless

back to pool
Comment 16 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:43:33 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 17 Andriy Voskoboinyk freebsd_committer freebsd_triage 2019-01-21 04:55:44 UTC
Fixed in base r343249.