Bug 252844 - wpa_supplicant(8): CPU% is high and can't associate with wireless network
Summary: wpa_supplicant(8): CPU% is high and can't associate with wireless network
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Many People
Assignee: Cy Schubert
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-01-20 04:41 UTC by shu
Modified: 2021-01-20 17:14 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description shu 2021-01-20 04:41:49 UTC
My laptop has one wireless NIC and I use wpa_supplicant to connect wireless.
After git commit 81728a538d24f483d0986850fa3f51d5d84d8f26, the wireless network cannot be associated.

lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
	options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
	inet6 ::1 prefixlen 128
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
	inet 127.0.0.1 netmask 0xff000000
	groups: lo
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
wlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	ether 9c:da:3e:a1:8a:fb
	groups: wlan
	ssid "" channel 1 (2412 MHz 11g)
	regdomain FCC country US authmode WPA1+WPA2/802.11i privacy MIXED
	deftxkey UNDEF txpower 30 bmiss 10 scanvalid 60 protmode CTS wme
	roaming MANUAL bintval 0
	parent interface: iwm0
	media: IEEE 802.11 Wireless Ethernet autoselect (autoselect)
	status: no carrier
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
     
The wpa_supplicant's CPU is high.
root  901 100.0  0.1 18584  7916  -  Rs   11:35    38:47.24 /usr/sbin/wpa_supplicant -s -B -i wlan0 -c /etc/wpa_supplicant.conf -D bsd -P /var/run/wpa_supplicant/wlan0.pid

I trace wpa_supplicant, kdump tell me it loop works on
   901 100532 wpa_supplicant 0.108685 CALL  select(0x6,0x800e3d100,0x800e3d180,0x800e3d200,0x7fffffffe9e0)
   901 100532 wpa_supplicant 0.108689 RET   select 1
   901 100532 wpa_supplicant 0.108693 CALL  read(0x4,0x800e08010,0)
   901 100532 wpa_supplicant 0.108696 RET   read 0
0x4 is PF_ROUTE socket, and read(2)'len is 0, very weird.

I check wpa_supplicant's code, and find read length is got by sysctl
static size_t                                                                    
rtbuf_len(void)                                                                  
{                                                                                
    size_t len;                                                                  
                                                                                 
    int mib[6] = {CTL_NET, AF_ROUTE, 0, AF_INET, NET_RT_DUMP, 0};                
                                                                                 
    if (sysctl(mib, 6, NULL, &len, NULL, 0) < 0) {                               
        wpa_printf(MSG_WARNING, "%s failed: %s", __func__,                       
               strerror(errno));                                                 
        len = 2048;                                                              
    }                                                                            
                                                                                 
    return len;                                                                  
}
What is returned is the total length of current AF_INET route items.

When system is started, rc.d/netif starts network interfaces in "ifconfig -l" order,
very lucky, lo0 is before wlan0, and ipv4_up lo0 adds 127.0.0.1 to lo0, subsequently wlan0's up call wpa_supplicant and use sysctl to get rtbuf_len. 

respond to lo0 ipv4_up operation, 
before the git commit, network calls in_control->in_addprefix->rtinit, and then AF_INET family has route items, and sysctl return non zero.
after the git commit, in_addprefix calls in_handle_ifaddr_route, but in_handle_ifaddr_route don't add loopback routes, so sysctl returns zero cause the problem, maybe util rc.d/routing's start, loopback routes can be added.
Comment 1 shu 2021-01-20 04:54:26 UTC
https://reviews.freebsd.org/D28249
Comment 2 david 2021-01-20 14:05:48 UTC
I have this symptom as well, and note that issuing "service netif restart wlan0" (manually) after wpa_supplicant' CPU utilization goes high appears to allow wlan0 (iwn(4), in my case) to work.
Comment 3 Cy Schubert freebsd_committer freebsd_triage 2021-01-20 14:49:42 UTC
Do you or anyone subscribed to this PR have wlan0 as a member of a lagg, e.g. lagg0?
Comment 4 david 2021-01-20 14:59:00 UTC
I do not (have wlan0 as a member of a LAGG).  (I tried it for a while a couple of years ago, with em0, but the wireless & wired networks (at work) were quite separate, so I'd lose connections to other machines, which is a lot of what I do with the laptop.  So I stopped trying to use LAGG.)

I note, too, that I actually normally run stable/12 on this laptop, and do not see the problem while doing that.  It's only when I boot to head (as I track both stable/12 and head daily) that I see the issue.

For reference, here are the last couple of uname strings for stable/12:

FreeBSD g1-55.catwhisker.org 12.2-STABLE FreeBSD 12.2-STABLE #934 stable/12-c243240-g369a4023e67: Tue Jan 19 03:33:41 PST 2021     root@g1-55.catwhisker.org:/common/S1/obj/usr/src/amd64.amd64/sys/CANARY  amd64 1202505 1202505
FreeBSD g1-55.catwhisker.org 12.2-STABLE FreeBSD 12.2-STABLE #935 stable/12-c243244-g7033fd68f7e: Wed Jan 20 03:34:45 PST 2021     root@g1-55.catwhisker.org:/common/S1/obj/usr/src/amd64.amd64/sys/CANARY  amd64 1202505 1202505


and head:

FreeBSD localhost 13.0-ALPHA1 FreeBSD 13.0-ALPHA1 #127 main-c256109-gc987d6a67766-dirty: Tue Jan 19 05:05:12 PST 2021     root@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY  amd64 1300136 1300136
FreeBSD g1-55.catwhisker.org 13.0-ALPHA1 FreeBSD 13.0-ALPHA1 #128 bc7ee8e5bc55-dirty: Wed Jan 20 04:42:20 PST 2021     root@localhost:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY  amd64 1300136 1300136
Comment 5 Cy Schubert freebsd_committer freebsd_triage 2021-01-20 15:18:17 UTC
I was only able to reproduce the problem when wlan0 was a member of lagg0. However the patch does fix that issue.
Comment 6 Cy Schubert freebsd_committer freebsd_triage 2021-01-20 15:21:46 UTC
Looks good. I'll commit it and create a pull request to our upstream hostap.
Comment 7 commit-hook freebsd_committer freebsd_triage 2021-01-20 15:46:11 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=d70886d063166786ded0007af8cdcbf57b7b4827

commit d70886d063166786ded0007af8cdcbf57b7b4827
Author:     Cy Schubert <cy@FreeBSD.org>
AuthorDate: 2021-01-20 15:20:22 +0000
Commit:     Cy Schubert <cy@FreeBSD.org>
CommitDate: 2021-01-20 15:45:18 +0000

    wpa_supplicant uses PF_ROUTE to return the routing table in order to
    determine the length of the routing table buffer. As of 81728a538d24
    wpa_supplicant is started before the routing table has been populated
    resulting in the length of zero to be returned. This causes
    wpa_supplicant to loop endlessly. (The workaround is to kill and restart
    wpa_supplicant as by the time it is restarted the routing table is
    populated.)

    (Personally, I was not able to reproduce this unless wlan0 was a member of
    lagg0. However, others experienced this problem on standalone wlan0.)

    PR:             252844
    Submitted by:   shu <ankohuu _ outlook.com>
    Reported by:    shu <ankohuu _ outlook.com>
    Reviewed by:    cy
    X-MFC with:     81728a538d24f483d0986850fa3f51d5d84d8f26
    Differential Revision:  https://reviews.freebsd.org/D28249

 contrib/wpa/src/drivers/driver_bsd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 8 Cy Schubert freebsd_committer freebsd_triage 2021-01-20 15:47:18 UTC
Committed. Thanks.
Comment 9 commit-hook freebsd_committer freebsd_triage 2021-01-20 17:14:25 UTC
A commit references this bug:

Author: cy
Date: Wed Jan 20 17:14:17 UTC 2021
New revision: 562150
URL: https://svnweb.freebsd.org/changeset/ports/562150

Log:
  This is the ports version of src commit
  d70886d063166786ded0007af8cdcbf57b7b4827.

  wpa_supplicant uses PF_ROUTE to return the routing table in order to
  determine the length of the routing table buffer. As of 81728a538d24
  wpa_supplicant is started before the routing table has been populated
  resulting in the length of zero to be returned. This causes
  wpa_supplicant to loop endlessly. (The workaround is to kill and restart
  wpa_supplicant as by the time it is restarted the routing table is
  populated.)

  (Personally, I was not able to reproduce this unless wlan0 was a member of
  lagg0. However, others experienced this problem on standalone wlan0.)

  PR:		252844
  Submitted by:	shu <ankohuu _ outlook.com>
  Reported by:	shu <ankohuu _ outlook.com>
  Reviewed by:	cy
  Differential Revision:	https://reviews.freebsd.org/D28249

Changes:
  head/net/hostapd/Makefile
  head/net/hostapd/files/patch-src_drivers_driver__bsd.c
  head/security/wpa_supplicant/Makefile
  head/security/wpa_supplicant/files/patch-src_drivers_driver__bsd.c