144755 – [wlan] netif/devd race

Bug 144755 - [wlan] netif/devd race

Summary: [wlan] netif/devd race

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	wireless (show other bugs)
Version:	8.0-STABLE
Hardware:	Any Any

Importance:	Normal Affects Only Me
Assignee:	freebsd-wireless (Nobody)

URL:
Keywords:

Depends on:
Blocks:

Reported:	2010-03-15 06:10 UTC by edwin
Modified:	2018-12-18 03:56 UTC (History)
CC List:	1 user (show)

See Also:

Attachments
iwi_node_ref.diff (1.14 KB, patch) 2010-08-05 21:08 UTC, bschmidt	no flags	Details \| Diff
patch.txt (1.02 KB, text/plain; charset=us-ascii) 2010-08-05 21:55 UTC, Alex Kozlov	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description edwin 2010-03-15 06:10:08 UTC

Machine panics when issueing /etc/rc.d/netif restart.

wlan0: ieee80211_new_state_locked: pending SCAN -> AUTH transition lost

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0xc4fe81d5
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0xc4fa9b0c
stack pointer           = 0x28:0xdf834b7c
frame pointer           = 0x28:0xdf834c34
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (iwi0 taskq)
trap number             = 12
panic: page fault
cpuid = 0
Uptime: 30m31s
Physical memory: 750 MB
Dumping 78 MB: (CTRL-C to abort)  (CTRL-C to abort)  63 47 31 15


    at /usr/home/edwin/svn/build/sys/i386/i386/trap.c:938
#4  0xc0bc1ce0 in trap_pfault (frame=0xdf834b3c, usermode=0, eva=3305013717)
    at /usr/home/edwin/svn/build/sys/i386/i386/trap.c:851
#5  0xc0bc2625 in trap (frame=0xdf834b3c)
    at /usr/home/edwin/svn/build/sys/i386/i386/trap.c:533
#6  0xc0ba4c0b in calltrap ()
    at /usr/home/edwin/svn/build/sys/i386/i386/exception.s:165
#7  0xc4fa9b0c in iwi_auth_and_assoc (sc=0xc4298800, vap=0xc4a61000)
    at /usr/home/edwin/svn/build/sys/modules/iwi/../../dev/iwi/if_iwi.c:2854
#8  0xc4faa1f9 in iwi_newstate (vap=0xc4a61000, nstate=IEEE80211_S_AUTH,
    arg=192)
    at /usr/home/edwin/svn/build/sys/modules/iwi/../../dev/iwi/if_iwi.c:1001
#9  0xc09767a1 in ieee80211_newstate_cb (xvap=0xc4a61000, npending=1)
    at /usr/home/edwin/svn/build/sys/net80211/ieee80211_proto.c:1654
#10 0xc08c52a2 in taskqueue_run (queue=0xc4b61cc0)
    at /usr/home/edwin/svn/build/sys/kern/subr_taskqueue.c:239
#11 0xc08c54ad in taskqueue_thread_loop (arg=0xc4fad074)
    at /usr/home/edwin/svn/build/sys/kern/subr_taskqueue.c:360
#12 0xc0862231 in fork_exit (callout=0xc08c53f0 <taskqueue_thread_loop>,
    arg=0xc4fad074, frame=0xdf834d38)
    at /usr/home/edwin/svn/build/sys/kern/kern_fork.c:843
#13 0xc0ba4c80 in fork_trampoline ()
    at /usr/home/edwin/svn/build/sys/i386/i386/exception.s:270
(kgdb)


(kgdb) frame 7
#7  0xc4fa9b0c in iwi_auth_and_assoc (sc=0xc4298800, vap=0xc4a61000)
    at /usr/home/edwin/svn/build/sys/modules/iwi/../../dev/iwi/if_iwi.c:2854
2854            rs.mode = mode;
(kgdb) list
2849            if (error != 0)
2850                    goto done;
2851
2852            /* the rate set has already been "negotiated" */
2853            memset(&rs, 0, sizeof rs);
2854            rs.mode = mode;
2855            rs.type = IWI_RATESET_TYPE_NEGOTIATED;
2856            rs.nrates = ni->ni_rates.rs_nrates;
2857            if (rs.nrates > IWI_RATESET_SIZE) {
2858                    DPRINTF(("Truncating negotiated rate set from %u\n",
(kgdb) info local
ic = (struct ieee80211com *) 0xc4fad000
ifp = (struct ifnet *) 0xc43f2000
ni = (struct ieee80211_node *) 0xc4fe8000
config = {bluetooth_coexistence = 0 '\0', reserved1 = 0 '\0',
  answer_pbreq = 0 '\0', allow_invalid_frames = 0 '\0',
  multicast_enabled = 1 '\001', drop_unicast_unencrypted = 0 '\0',
  disable_unicast_decryption = 1 '\001', drop_multicast_unencrypted = 0 '\0',
  disable_multicast_decryption = 1 '\001', antenna = 0 '\0',
  include_crc = 0 '\0', use_protection = 1 '\001',
  protection_ctsonly = 0 '\0', enable_multicast_filtering = 0 '\0',
  bluetooth_threshold = 0 '\0', silence_threshold = 0 '\0',
  allow_beacon_and_probe_resp = 0 '\0', allow_mgt = 0 '\0',
  noise_reported = 0 '\0', reserved5 = 0 '\0'}
assoc = Variable "assoc" is not available.
(kgdb) p ni
$1 = (struct ieee80211_node *) 0xc4fe8000
(kgdb) p *ni
Cannot access memory at address 0xc4fe8000


Information available:

[~/svn/build] edwin@vaio>svn info
[...]
Revision: 205159
Last Changed Author: jilles
Last Changed Rev: 205150
Last Changed Date: 2010-03-15 00:07:40 +1100 (Mon, 15 Mar 2010)


In http://www.mavetju.org/~edwin/vmcore.1:

-rw-r--r--  1 1001  80     22423 Mar 15 05:43 core.txt.1.bz2
-rw-r--r--  1 1001  80       464 Mar 15 05:43 info.1
-rw-r--r--  1 1001  80  18104049 Mar 15 06:01 kernel.debug.bz2
-rw-r--r--  1 1001  80  17194295 Mar 15 05:51 vmcore.1.bz2

Fix: 

I am able to reproduce this on demand, please let me know which
steps you want me to take to troubleshoot this or test patches if needed.

Comment 1 Mark Linimon freebsd_committer

2010-03-15 06:19:52 UTC

Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Over to maintainer(s).

Comment 2 Mark Linimon freebsd_committer

2010-03-15 08:31:21 UTC

Responsible Changed
From-To: freebsd-fs->freebsd-net

oops. 

pointed out by: brucec

Comment 3 Joey Mingrone 2010-04-30 15:43:22 UTC

I'm seeing this as well after an upgrade to 8.0-RELEASE-p2 from 7.2-RELEASE.

Comment 4 edwin 2010-07-28 04:33:42 UTC

Please note that this issue is still happening on 8.1-STABLE #2 r210522

-- 
Edwin Groothuis		Website: http://www.mavetju.org/
edwin@mavetju.org	Weblog:  http://www.mavetju.org/weblog/

Comment 5 bschmidt 2010-07-28 11:11:17 UTC

On Wed, Jul 28, 2010 at 06:00, Edwin Groothuis <edwin@mavetju.org> wrote:
> The following reply was made to PR kern/144755; it has been noted by GNAT=
S.
>
> From: Edwin Groothuis <edwin@mavetju.org>
> To: bug-followup@FreeBSD.org
> Cc:
> Subject: Re: kern/144755: [iwi] [panic] iwi panic when issuing /etc/rc.d/=
netif restart on 8-STABLE r205159
> Date: Wed, 28 Jul 2010 13:33:42 +1000
>
> =A0Please note that this issue is still happening on 8.1-STABLE #2 r21052=
2

Can you check if wpa_supplicant gets started twice after
"/etc/rc.d/netif restart"?

I have the feeling that this is a more or less known issue, a race
between devd (/etc/pccard_ether) and /etc/rc.d/netif. There is a small
window (a few ms) where this can happen and our net80211 isn't capable
of handling two running wpa_supplicants on the same interface. Which
leads to all kind of weird issues, .e.g. panics.

--=20
Bernhard

Comment 6 edwin 2010-08-04 23:31:27 UTC

Hello Bernhard!

On Wed, Jul 28, 2010 at 12:11:17PM +0200, Bernhard Schmidt wrote:
> Can you check if wpa_supplicant gets started twice after
> "/etc/rc.d/netif restart"?

According to core.txt there are two wpa_supplicant processes, but
I don't know if it is caused by a fork of itself or if it is caused
by something started twice:

    0  7084     1   0  56  0  5192     0 select Ds    ??  3686:38.00 [wpa_suppli
    0  7085     1   0  57  0  5192     0 select Ds    ??  2617:11.00 [wpa_suppli

Based on the PPID of 1 for both, I would say it gets started twice.

root     wpa_supplicant  7085 root /             2 drwxr-xr-x     512  r
root     wpa_supplicant  7085   wd /             2 drwxr-xr-x     512  r
root     wpa_supplicant  7085 text /usr     4526084 -r-xr-xr-x  295932  r
root     wpa_supplicant  7085    0 /dev          7 crw-rw-rw-    null rw
root     wpa_supplicant  7085    1 /dev          7 crw-rw-rw-    null rw
root     wpa_supplicant  7085    2 /dev          7 crw-rw-rw-    null rw
root     wpa_supplicant  7085    3* internet dgram udp c52b1000
root     wpa_supplicant  7085    4* route raw 0 c527f4d4
root     wpa_supplicant  7085    5 /dev         15 crw-------     bpf rw

root     wpa_supplicant  7084 root /             2 drwxr-xr-x     512  r
root     wpa_supplicant  7084   wd /             2 drwxr-xr-x     512  r
root     wpa_supplicant  7084 text /usr     4526084 -r-xr-xr-x  295932  r
root     wpa_supplicant  7084    0 /dev          7 crw-rw-rw-    null rw
root     wpa_supplicant  7084    1 /dev          7 crw-rw-rw-    null rw
root     wpa_supplicant  7084    2 /dev          7 crw-rw-rw-    null rw
root     wpa_supplicant  7084    3* internet dgram udp c4f520dc
root     wpa_supplicant  7084    4* route raw 0 c527f670
root     wpa_supplicant  7084    5* local stream c4f508bc
root     wpa_supplicant  7084    6 /var      23622 -rw-------       4  w
root     wpa_supplicant  7084    7* local stream c4f5035c <-> c4f50408
root     wpa_supplicant  7084    8 /dev         15 crw-------     bpf rw

> I have the feeling that this is a more or less known issue, a race
> between devd (/etc/pccard_ether) and /etc/rc.d/netif. There is a small
> window (a few ms) where this can happen and our net80211 isn't capable
> of handling two running wpa_supplicants on the same interface. Which
> leads to all kind of weird issues, .e.g. panics.

Nice! Is there a workaround or patch available for testing?

Edwin

-- 
Edwin Groothuis		Website: http://www.mavetju.org/
edwin@mavetju.org	Weblog:  http://www.mavetju.org/weblog/

Comment 7 bschmidt 2010-08-05 09:08:42 UTC

On Thu, Aug 5, 2010 at 00:31, Edwin Groothuis <edwin@mavetju.org> wrote:
> Hello Bernhard!
>
> On Wed, Jul 28, 2010 at 12:11:17PM +0200, Bernhard Schmidt wrote:
>> Can you check if wpa_supplicant gets started twice after
>> "/etc/rc.d/netif restart"?
>
> According to core.txt there are two wpa_supplicant processes, but
> I don't know if it is caused by a fork of itself or if it is caused
> by something started twice:
>
> =A0 =A00 =A07084 =A0 =A0 1 =A0 0 =A056 =A00 =A05192 =A0 =A0 0 select Ds =
=A0 =A0?? =A03686:38.00 [wpa_suppli
> =A0 =A00 =A07085 =A0 =A0 1 =A0 0 =A057 =A00 =A05192 =A0 =A0 0 select Ds =
=A0 =A0?? =A02617:11.00 [wpa_suppli
>
> Based on the PPID of 1 for both, I would say it gets started twice.

Indeed.

>
> [..]
>> I have the feeling that this is a more or less known issue, a race
>> between devd (/etc/pccard_ether) and /etc/rc.d/netif. There is a small
>> window (a few ms) where this can happen and our net80211 isn't capable
>> of handling two running wpa_supplicants on the same interface. Which
>> leads to all kind of weird issues, .e.g. panics.
>
> Nice! Is there a workaround or patch available for testing?

A temporary workaround is to add a "sleep 1" in /etc/pccard_ether,
something I'd rather not commit. I haven't yet found a clean solution
for this.. any pointers welcome. :)

--=20
Bernhard

Comment 8 bschmidt 2010-08-05 21:08:38 UTC

On Thu, Aug 5, 2010 at 00:31, Edwin Groothuis <edwin@mavetju.org> wrote:
> [..]
> Nice! Is there a workaround or patch available for testing?

Please give the attached patch a try. It does not prevent
wpa_supplicant from starting twice (for that you can define
ctrl_interface= in wpa_supplicant.conf), but should no longer panic.

Thanks.

-- 
Bernhard

Comment 9 Alex Kozlov 2010-08-05 21:55:43 UTC

On Thu, Aug 05, 2010 at 08:10:08PM +0000, Bernhard Schmidt wrote:
> The following reply was made to PR kern/144755; it has been noted by GNATS.
> 
> From: Bernhard Schmidt <bschmidt@techwires.net>
> To: Edwin Groothuis <edwin@mavetju.org>
> Cc: bug-followup@freebsd.org
> Subject: Re: kern/144755: [iwi] [panic] iwi panic when issuing /etc/rc.d/netif 
> 	restart on 8-STABLE r205159
> Date: Thu, 5 Aug 2010 22:08:38 +0200
> 
>  --0015175cda7eab5eec048d191d67
>  Content-Type: text/plain; charset=ISO-8859-1
>  
>  On Thu, Aug 5, 2010 at 00:31, Edwin Groothuis <edwin@mavetju.org> wrote:
>  > [..]
>  > Nice! Is there a workaround or patch available for testing?
>  
>  Please give the attached patch a try. It does not prevent
>  wpa_supplicant from starting twice (for that you can define
>  ctrl_interface= in wpa_supplicant.conf), but should no longer panic.
I can reproduce this panic for if_rum, similiar patch also helps.

wlan0: ieee80211_new_state_locked: pending RUN -> SCAN transition lost
wlan0: ieee80211_new_state_locked: pending RUN -> SCAN transition lost

Fatal trap 12: page fault while in kernel mode
fault virtual address	= 0xffff
fault code		= supervisor read, page not present
instruction pointer	= 0x20:0xc0900d42
stack pointer	        = 0x28:0xc4f05bac
frame pointer	        = 0x28:0xc4f05bb8
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 0 (rum0 taskq)
trap number		= 12
panic: page fault
KDB: stack backtrace:
db_trace_self_wrapper(c0669547,c06cb000,c0660c6a,c4f05a5c,c4f05a5c,...) at 0xc0436706 = db_trace_self_wrapper+0x26
panic(c0660c6a,c0680c79,c4f05b6c,1,1,...) at 0xc04b898d = panic+0xed
trap_fatal(c06c9740,f000,1,0,c04c0ef6,...) at 0xc06474bd = trap_fatal+0x23d
trap_pfault(0,c066c5e8,2d7,0,c06c9220,...) at 0xc064787a = trap_pfault+0x27a
trap(c4f05b6c) at 0xc06481ab = trap+0x39b
calltrap() at 0xc062d4ac = calltrap+0x6
--- trap 0xc, eip = 0xc0900d42, esp = 0xc4f05bac, ebp = 0xc4f05bb8 ---
ieee80211_getcapinfo(c5caa000,ffff,c08f415a,c5caa874,c5463d00,...) at 0xc0900d42 = ieee80211_getcapinfo+0x71
ieee80211_beacon_construct(c62a8000,18,676,c50f5c00,c54e3988,...) at 0xc090308d = ieee80211_beacon_construct+0x67
ieee80211_beacon_alloc(c62a8000,c5caa874,6,2c5,5,...) at 0xc09039a0 = ieee80211_beacon_alloc+0x93
rum_newstate(c5caa000,5,ffffffff,652,c5362014,...) at 0xc9a9b55f = rum_newstate+0x259
ieee80211_newstate_cb(c5caa000,4,0,c0695c9c,0,...) at 0xc0906eb8 = ieee80211_newstate_cb+0x7a
taskqueue_run(c537db00,c537db18,0,c0661905,0,...) at 0xc04ef61a = taskqueue_run+0x8a
taskqueue_thread_loop(c5362074,c4f05d38,0,0,0,...) at 0xc04efd74 = taskqueue_thread_loop+0x44
fork_exit(c04efd30,c5362074,c4f05d38) at 0xc048e868 = fork_exit+0x88
fork_trampoline() at 0xc062d524 = fork_trampoline+0x8


--
Adios

Comment 10 Bernhard Schmidt freebsd_committer

2010-08-11 10:52:05 UTC

Responsible Changed
From-To: freebsd-net->bschmidt

Over to me.

Comment 11 Bernhard Schmidt freebsd_committer

2012-06-19 08:22:07 UTC

Responsible Changed
From-To: bschmidt->freebsd-wireless

back to pool, see also kern/153594

Comment 12 Eitan Adler freebsd_committer

2018-05-28 19:49:56 UTC

batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.

Comment 13 Andriy Voskoboinyk freebsd_committer

2018-12-18 03:56:38 UTC

(In reply to edwin from comment #0)
This one was fixed in base r258612

(In reply to Alex Kozlov from comment #9)
and this one should be completely fixed in base r296237.