Bug 144755 - [wlan] netif/devd race
Summary: [wlan] netif/devd race
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: wireless (show other bugs)
Version: 8.0-STABLE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-wireless (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-03-15 06:10 UTC by edwin
Modified: 2018-12-18 03:56 UTC (History)
1 user (show)

See Also:


Attachments
iwi_node_ref.diff (1.14 KB, patch)
2010-08-05 21:08 UTC, bschmidt
no flags Details | Diff
patch.txt (1.02 KB, text/plain; charset=us-ascii)
2010-08-05 21:55 UTC, Alex Kozlov
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description edwin 2010-03-15 06:10:08 UTC
Machine panics when issueing /etc/rc.d/netif restart.

wlan0: ieee80211_new_state_locked: pending SCAN -> AUTH transition lost

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0xc4fe81d5
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0xc4fa9b0c
stack pointer           = 0x28:0xdf834b7c
frame pointer           = 0x28:0xdf834c34
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (iwi0 taskq)
trap number             = 12
panic: page fault
cpuid = 0
Uptime: 30m31s
Physical memory: 750 MB
Dumping 78 MB: (CTRL-C to abort)  (CTRL-C to abort)  63 47 31 15


    at /usr/home/edwin/svn/build/sys/i386/i386/trap.c:938
#4  0xc0bc1ce0 in trap_pfault (frame=0xdf834b3c, usermode=0, eva=3305013717)
    at /usr/home/edwin/svn/build/sys/i386/i386/trap.c:851
#5  0xc0bc2625 in trap (frame=0xdf834b3c)
    at /usr/home/edwin/svn/build/sys/i386/i386/trap.c:533
#6  0xc0ba4c0b in calltrap ()
    at /usr/home/edwin/svn/build/sys/i386/i386/exception.s:165
#7  0xc4fa9b0c in iwi_auth_and_assoc (sc=0xc4298800, vap=0xc4a61000)
    at /usr/home/edwin/svn/build/sys/modules/iwi/../../dev/iwi/if_iwi.c:2854
#8  0xc4faa1f9 in iwi_newstate (vap=0xc4a61000, nstate=IEEE80211_S_AUTH,
    arg=192)
    at /usr/home/edwin/svn/build/sys/modules/iwi/../../dev/iwi/if_iwi.c:1001
#9  0xc09767a1 in ieee80211_newstate_cb (xvap=0xc4a61000, npending=1)
    at /usr/home/edwin/svn/build/sys/net80211/ieee80211_proto.c:1654
#10 0xc08c52a2 in taskqueue_run (queue=0xc4b61cc0)
    at /usr/home/edwin/svn/build/sys/kern/subr_taskqueue.c:239
#11 0xc08c54ad in taskqueue_thread_loop (arg=0xc4fad074)
    at /usr/home/edwin/svn/build/sys/kern/subr_taskqueue.c:360
#12 0xc0862231 in fork_exit (callout=0xc08c53f0 <taskqueue_thread_loop>,
    arg=0xc4fad074, frame=0xdf834d38)
    at /usr/home/edwin/svn/build/sys/kern/kern_fork.c:843
#13 0xc0ba4c80 in fork_trampoline ()
    at /usr/home/edwin/svn/build/sys/i386/i386/exception.s:270
(kgdb)


(kgdb) frame 7
#7  0xc4fa9b0c in iwi_auth_and_assoc (sc=0xc4298800, vap=0xc4a61000)
    at /usr/home/edwin/svn/build/sys/modules/iwi/../../dev/iwi/if_iwi.c:2854
2854            rs.mode = mode;
(kgdb) list
2849            if (error != 0)
2850                    goto done;
2851
2852            /* the rate set has already been "negotiated" */
2853            memset(&rs, 0, sizeof rs);
2854            rs.mode = mode;
2855            rs.type = IWI_RATESET_TYPE_NEGOTIATED;
2856            rs.nrates = ni->ni_rates.rs_nrates;
2857            if (rs.nrates > IWI_RATESET_SIZE) {
2858                    DPRINTF(("Truncating negotiated rate set from %u\n",
(kgdb) info local
ic = (struct ieee80211com *) 0xc4fad000
ifp = (struct ifnet *) 0xc43f2000
ni = (struct ieee80211_node *) 0xc4fe8000
config = {bluetooth_coexistence = 0 '\0', reserved1 = 0 '\0',
  answer_pbreq = 0 '\0', allow_invalid_frames = 0 '\0',
  multicast_enabled = 1 '\001', drop_unicast_unencrypted = 0 '\0',
  disable_unicast_decryption = 1 '\001', drop_multicast_unencrypted = 0 '\0',
  disable_multicast_decryption = 1 '\001', antenna = 0 '\0',
  include_crc = 0 '\0', use_protection = 1 '\001',
  protection_ctsonly = 0 '\0', enable_multicast_filtering = 0 '\0',
  bluetooth_threshold = 0 '\0', silence_threshold = 0 '\0',
  allow_beacon_and_probe_resp = 0 '\0', allow_mgt = 0 '\0',
  noise_reported = 0 '\0', reserved5 = 0 '\0'}
assoc = Variable "assoc" is not available.
(kgdb) p ni
$1 = (struct ieee80211_node *) 0xc4fe8000
(kgdb) p *ni
Cannot access memory at address 0xc4fe8000


Information available:

[~/svn/build] edwin@vaio>svn info
[...]
Revision: 205159
Last Changed Author: jilles
Last Changed Rev: 205150
Last Changed Date: 2010-03-15 00:07:40 +1100 (Mon, 15 Mar 2010)


In http://www.mavetju.org/~edwin/vmcore.1:

-rw-r--r--  1 1001  80     22423 Mar 15 05:43 core.txt.1.bz2
-rw-r--r--  1 1001  80       464 Mar 15 05:43 info.1
-rw-r--r--  1 1001  80  18104049 Mar 15 06:01 kernel.debug.bz2
-rw-r--r--  1 1001  80  17194295 Mar 15 05:51 vmcore.1.bz2

Fix: 

I am able to reproduce this on demand, please let me know which
steps you want me to take to troubleshoot this or test patches if needed.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2010-03-15 06:19:52 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Over to maintainer(s).
Comment 2 Mark Linimon freebsd_committer freebsd_triage 2010-03-15 08:31:21 UTC
Responsible Changed
From-To: freebsd-fs->freebsd-net

oops. 

pointed out by: brucec
Comment 3 Joey Mingrone 2010-04-30 15:43:22 UTC
I'm seeing this as well after an upgrade to 8.0-RELEASE-p2 from 7.2-RELEASE.
Comment 4 edwin 2010-07-28 04:33:42 UTC
Please note that this issue is still happening on 8.1-STABLE #2 r210522

-- 
Edwin Groothuis		Website: http://www.mavetju.org/
edwin@mavetju.org	Weblog:  http://www.mavetju.org/weblog/
Comment 5 bschmidt 2010-07-28 11:11:17 UTC
On Wed, Jul 28, 2010 at 06:00, Edwin Groothuis <edwin@mavetju.org> wrote:
> The following reply was made to PR kern/144755; it has been noted by GNAT=
S.
>
> From: Edwin Groothuis <edwin@mavetju.org>
> To: bug-followup@FreeBSD.org
> Cc:
> Subject: Re: kern/144755: [iwi] [panic] iwi panic when issuing /etc/rc.d/=
netif restart on 8-STABLE r205159
> Date: Wed, 28 Jul 2010 13:33:42 +1000
>
> =A0Please note that this issue is still happening on 8.1-STABLE #2 r21052=
2


Can you check if wpa_supplicant gets started twice after
"/etc/rc.d/netif restart"?

I have the feeling that this is a more or less known issue, a race
between devd (/etc/pccard_ether) and /etc/rc.d/netif. There is a small
window (a few ms) where this can happen and our net80211 isn't capable
of handling two running wpa_supplicants on the same interface. Which
leads to all kind of weird issues, .e.g. panics.

--=20
Bernhard
Comment 6 edwin 2010-08-04 23:31:27 UTC
Hello Bernhard!

On Wed, Jul 28, 2010 at 12:11:17PM +0200, Bernhard Schmidt wrote:
> Can you check if wpa_supplicant gets started twice after
> "/etc/rc.d/netif restart"?

According to core.txt there are two wpa_supplicant processes, but
I don't know if it is caused by a fork of itself or if it is caused
by something started twice:

    0  7084     1   0  56  0  5192     0 select Ds    ??  3686:38.00 [wpa_suppli
    0  7085     1   0  57  0  5192     0 select Ds    ??  2617:11.00 [wpa_suppli

Based on the PPID of 1 for both, I would say it gets started twice.

root     wpa_supplicant  7085 root /             2 drwxr-xr-x     512  r
root     wpa_supplicant  7085   wd /             2 drwxr-xr-x     512  r
root     wpa_supplicant  7085 text /usr     4526084 -r-xr-xr-x  295932  r
root     wpa_supplicant  7085    0 /dev          7 crw-rw-rw-    null rw
root     wpa_supplicant  7085    1 /dev          7 crw-rw-rw-    null rw
root     wpa_supplicant  7085    2 /dev          7 crw-rw-rw-    null rw
root     wpa_supplicant  7085    3* internet dgram udp c52b1000
root     wpa_supplicant  7085    4* route raw 0 c527f4d4
root     wpa_supplicant  7085    5 /dev         15 crw-------     bpf rw

root     wpa_supplicant  7084 root /             2 drwxr-xr-x     512  r
root     wpa_supplicant  7084   wd /             2 drwxr-xr-x     512  r
root     wpa_supplicant  7084 text /usr     4526084 -r-xr-xr-x  295932  r
root     wpa_supplicant  7084    0 /dev          7 crw-rw-rw-    null rw
root     wpa_supplicant  7084    1 /dev          7 crw-rw-rw-    null rw
root     wpa_supplicant  7084    2 /dev          7 crw-rw-rw-    null rw
root     wpa_supplicant  7084    3* internet dgram udp c4f520dc
root     wpa_supplicant  7084    4* route raw 0 c527f670
root     wpa_supplicant  7084    5* local stream c4f508bc
root     wpa_supplicant  7084    6 /var      23622 -rw-------       4  w
root     wpa_supplicant  7084    7* local stream c4f5035c <-> c4f50408
root     wpa_supplicant  7084    8 /dev         15 crw-------     bpf rw

> I have the feeling that this is a more or less known issue, a race
> between devd (/etc/pccard_ether) and /etc/rc.d/netif. There is a small
> window (a few ms) where this can happen and our net80211 isn't capable
> of handling two running wpa_supplicants on the same interface. Which
> leads to all kind of weird issues, .e.g. panics.

Nice! Is there a workaround or patch available for testing?

Edwin

-- 
Edwin Groothuis		Website: http://www.mavetju.org/
edwin@mavetju.org	Weblog:  http://www.mavetju.org/weblog/
Comment 7 bschmidt 2010-08-05 09:08:42 UTC
On Thu, Aug 5, 2010 at 00:31, Edwin Groothuis <edwin@mavetju.org> wrote:
> Hello Bernhard!
>
> On Wed, Jul 28, 2010 at 12:11:17PM +0200, Bernhard Schmidt wrote:
>> Can you check if wpa_supplicant gets started twice after
>> "/etc/rc.d/netif restart"?
>
> According to core.txt there are two wpa_supplicant processes, but
> I don't know if it is caused by a fork of itself or if it is caused
> by something started twice:
>
> =A0 =A00 =A07084 =A0 =A0 1 =A0 0 =A056 =A00 =A05192 =A0 =A0 0 select Ds =
=A0 =A0?? =A03686:38.00 [wpa_suppli
> =A0 =A00 =A07085 =A0 =A0 1 =A0 0 =A057 =A00 =A05192 =A0 =A0 0 select Ds =
=A0 =A0?? =A02617:11.00 [wpa_suppli
>
> Based on the PPID of 1 for both, I would say it gets started twice.

Indeed.

>
> [..]
>> I have the feeling that this is a more or less known issue, a race
>> between devd (/etc/pccard_ether) and /etc/rc.d/netif. There is a small
>> window (a few ms) where this can happen and our net80211 isn't capable
>> of handling two running wpa_supplicants on the same interface. Which
>> leads to all kind of weird issues, .e.g. panics.
>
> Nice! Is there a workaround or patch available for testing?

A temporary workaround is to add a "sleep 1" in /etc/pccard_ether,
something I'd rather not commit. I haven't yet found a clean solution
for this.. any pointers welcome. :)

--=20
Bernhard
Comment 8 bschmidt 2010-08-05 21:08:38 UTC
On Thu, Aug 5, 2010 at 00:31, Edwin Groothuis <edwin@mavetju.org> wrote:
> [..]
> Nice! Is there a workaround or patch available for testing?

Please give the attached patch a try. It does not prevent
wpa_supplicant from starting twice (for that you can define
ctrl_interface= in wpa_supplicant.conf), but should no longer panic.

Thanks.

-- 
Bernhard
Comment 9 Alex Kozlov 2010-08-05 21:55:43 UTC
On Thu, Aug 05, 2010 at 08:10:08PM +0000, Bernhard Schmidt wrote:
> The following reply was made to PR kern/144755; it has been noted by GNATS.
> 
> From: Bernhard Schmidt <bschmidt@techwires.net>
> To: Edwin Groothuis <edwin@mavetju.org>
> Cc: bug-followup@freebsd.org
> Subject: Re: kern/144755: [iwi] [panic] iwi panic when issuing /etc/rc.d/netif 
> 	restart on 8-STABLE r205159
> Date: Thu, 5 Aug 2010 22:08:38 +0200
> 
>  --0015175cda7eab5eec048d191d67
>  Content-Type: text/plain; charset=ISO-8859-1
>  
>  On Thu, Aug 5, 2010 at 00:31, Edwin Groothuis <edwin@mavetju.org> wrote:
>  > [..]
>  > Nice! Is there a workaround or patch available for testing?
>  
>  Please give the attached patch a try. It does not prevent
>  wpa_supplicant from starting twice (for that you can define
>  ctrl_interface= in wpa_supplicant.conf), but should no longer panic.
I can reproduce this panic for if_rum, similiar patch also helps.

wlan0: ieee80211_new_state_locked: pending RUN -> SCAN transition lost
wlan0: ieee80211_new_state_locked: pending RUN -> SCAN transition lost

Fatal trap 12: page fault while in kernel mode
fault virtual address	= 0xffff
fault code		= supervisor read, page not present
instruction pointer	= 0x20:0xc0900d42
stack pointer	        = 0x28:0xc4f05bac
frame pointer	        = 0x28:0xc4f05bb8
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 0 (rum0 taskq)
trap number		= 12
panic: page fault
KDB: stack backtrace:
db_trace_self_wrapper(c0669547,c06cb000,c0660c6a,c4f05a5c,c4f05a5c,...) at 0xc0436706 = db_trace_self_wrapper+0x26
panic(c0660c6a,c0680c79,c4f05b6c,1,1,...) at 0xc04b898d = panic+0xed
trap_fatal(c06c9740,f000,1,0,c04c0ef6,...) at 0xc06474bd = trap_fatal+0x23d
trap_pfault(0,c066c5e8,2d7,0,c06c9220,...) at 0xc064787a = trap_pfault+0x27a
trap(c4f05b6c) at 0xc06481ab = trap+0x39b
calltrap() at 0xc062d4ac = calltrap+0x6
--- trap 0xc, eip = 0xc0900d42, esp = 0xc4f05bac, ebp = 0xc4f05bb8 ---
ieee80211_getcapinfo(c5caa000,ffff,c08f415a,c5caa874,c5463d00,...) at 0xc0900d42 = ieee80211_getcapinfo+0x71
ieee80211_beacon_construct(c62a8000,18,676,c50f5c00,c54e3988,...) at 0xc090308d = ieee80211_beacon_construct+0x67
ieee80211_beacon_alloc(c62a8000,c5caa874,6,2c5,5,...) at 0xc09039a0 = ieee80211_beacon_alloc+0x93
rum_newstate(c5caa000,5,ffffffff,652,c5362014,...) at 0xc9a9b55f = rum_newstate+0x259
ieee80211_newstate_cb(c5caa000,4,0,c0695c9c,0,...) at 0xc0906eb8 = ieee80211_newstate_cb+0x7a
taskqueue_run(c537db00,c537db18,0,c0661905,0,...) at 0xc04ef61a = taskqueue_run+0x8a
taskqueue_thread_loop(c5362074,c4f05d38,0,0,0,...) at 0xc04efd74 = taskqueue_thread_loop+0x44
fork_exit(c04efd30,c5362074,c4f05d38) at 0xc048e868 = fork_exit+0x88
fork_trampoline() at 0xc062d524 = fork_trampoline+0x8


--
Adios
Comment 10 Bernhard Schmidt freebsd_committer freebsd_triage 2010-08-11 10:52:05 UTC
Responsible Changed
From-To: freebsd-net->bschmidt

Over to me.
Comment 11 Bernhard Schmidt freebsd_committer freebsd_triage 2012-06-19 08:22:07 UTC
Responsible Changed
From-To: bschmidt->freebsd-wireless

back to pool, see also kern/153594
Comment 12 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:49:56 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 13 Andriy Voskoboinyk freebsd_committer freebsd_triage 2018-12-18 03:56:38 UTC
(In reply to edwin from comment #0)
This one was fixed in base r258612

(In reply to Alex Kozlov from comment #9)
and this one should be completely fixed in base r296237.