Bug 209198 - iwn sometimes loses link on dhclient start
Summary: iwn sometimes loses link on dhclient start
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: wireless (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-wireless (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-05-02 12:45 UTC by david
Modified: 2019-01-21 18:49 UTC (History)
2 users (show)

See Also:


Attachments
dmesg output after "service netif restart wlan0" and logging in (88.16 KB, text/plain)
2016-05-02 12:45 UTC, david
no flags Details
/var/log/messages from the boot in question (319.33 KB, text/plain)
2016-05-02 12:46 UTC, david
no flags Details
/var/log/console.log from the boot in question (11.47 KB, text/plain)
2016-05-02 12:47 UTC, david
no flags Details
Verbose-boot dmesg from head/amd64@r299529M (109.40 KB, text/plain)
2016-05-12 12:59 UTC, david
no flags Details
/var/log/messages from head/amd64@r299529M (160.30 KB, text/plain)
2016-05-12 13:00 UTC, david
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description david 2016-05-02 12:45:11 UTC
Created attachment 169878 [details]
dmesg output after "service netif restart wlan0" and logging in

Over the last several months, as I reboot my laptop to the "head" slice for its daily update-in-place (by source rebuild), I've noticed that while the wlano NIC (using iwn hardware) almost always "Just Works" when I boot stable/10, in head, it's about a 50% probability that wpa_supplicant will start, bring up wlan0, and keep link up after dhclient starts -- vs. dropping link once dhclient starts, only to have dhclient continue to re-try, even though wlan0's link has dropped.

In the latter case, I have found a circumvention (of sorts): After dhclient finally times out and gives up, login (as root) on ttyv1, then issue "service netif restart wlan0".  In my experience, this always makes it work again. (As I use xdm, started via getty out of /etc/ttys, I then also issue "service xdm stop".)

Per suggestion, I have augmented /etc/rc.conf with 'wlandebug_wlan0="state+scan+auth+assoc+crypto"' ... and while I did not encounter the above-described issue on the first "head" boot this morning, I did on the second, when the laptop was running:

FreeBSD g1-252.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #406  r298919M/298920:1100106: Mon May  2 04:42:04 PDT 2016     root@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64

For each of the "head" boots this morning, I booted verbosely; I have retained a few log files from the second boot (where link was dropped), and managed to attach one of them.  (I didn't see a way to attach multiple files -- other than concatenating them into one single mess, which seemed a bit counterproductive.)

Note that the SSID of interest to me is "lmdhw-net".
Comment 1 david 2016-05-02 12:46:27 UTC
Created attachment 169879 [details]
/var/log/messages from the boot in question
Comment 2 david 2016-05-02 12:47:06 UTC
Created attachment 169880 [details]
/var/log/console.log from the boot in question
Comment 3 Andriy Voskoboinyk freebsd_committer freebsd_triage 2016-05-02 17:15:48 UTC
r298925 may help with initial DOWN->UP->DOWN transitions; however

> May  2 11:50:51  kernel: wlan0: scan_curchan_task: loop start; scandone=0
> May  2 11:50:51  kernel: wlan0: scan_curchan_task: chan  11n ->   1g [active, dwell min 20ms max 200ms]
> May  2 11:50:51  kernel: wlan0: scan_curchan_task: waiting
(hangs here)

looks like a driver misconfiguration; for now, https://reviews.freebsd.org/D6176 may workaround this issue (until more proper fix will be found).
Comment 4 david 2016-05-10 12:27:34 UTC
Thanks for the suggestion, Andriy.  I didn't try it for the first few days, as I wasn't seeing the failure (so it wouldn't have been possible for me to tell if the change had a useful effect).

After getting failures again yesterday and the day before, I verified that the diff will apply (and just noticed that I left the "--dry-run" in there, so I didn't apply it yet. :-( ).  Sigh.  (And I didn't get failures today, anyway.)
Comment 5 david 2016-05-12 12:59:23 UTC
Created attachment 170225 [details]
Verbose-boot dmesg from head/amd64@r299529M
Comment 6 david 2016-05-12 13:00:17 UTC
Created attachment 170226 [details]
/var/log/messages from head/amd64@r299529M
Comment 7 david 2016-05-12 13:02:22 UTC
OK; had a recurrence of the symptoms after applying the D6178 patch for yesterday's rebuild of head (after a couple of "normal" reboots of head with that patch).

Today's upgrade was from:
FreeBSD g1-252.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #414  r299423M/299425:1100107: Wed May 11 04:43:40 PDT 2016     root@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64

to:
FreeBSD g1-252.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #415  r299529M/299529:1100107: Thu May 12 05:33:45 PDT 2016     root@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64

I've saved /var/log/messages, as well as what I have of dmesg (from a verbose boot), and I managed to attach them.
Comment 8 commit-hook freebsd_committer freebsd_triage 2016-05-21 23:22:19 UTC
A commit references this bug:

Author: avos
Date: Sat May 21 23:21:42 UTC 2016
New revision: 300383
URL: https://svnweb.freebsd.org/changeset/base/300383

Log:
  net80211: send RTM_IEEE80211_SCAN event when scan was cancelled.

  wpa_supplicant(8) expects to see 'scan complete' event after every
  scan command; in case, when event is not sent it will hang for
  indefinite time.

  PR:		209198

Changes:
  head/sys/net80211/ieee80211_scan_sw.c
Comment 9 Andriy Voskoboinyk freebsd_committer freebsd_triage 2016-05-21 23:25:11 UTC
Now workaround should work.
Comment 10 commit-hook freebsd_committer freebsd_triage 2016-05-26 11:12:49 UTC
A commit references this bug:

Author: avos
Date: Thu May 26 11:12:36 UTC 2016
New revision: 300732
URL: https://svnweb.freebsd.org/changeset/base/300732

Log:
  iwn: add watchdog for scanning.

  Restart device if scanning was not done in time.

  Tested by:	david@catwhisker.org

  PR:		209198
  Differential Revision:	https://reviews.freebsd.org/D6176

Changes:
  head/sys/dev/iwn/if_iwn.c
  head/sys/dev/iwn/if_iwnvar.h
Comment 11 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2019-01-21 18:49:24 UTC
There is a commit referencing this PR, but it's still not closed and has been inactive for some time. Closing the PR as fixed but feel free to re-open it if the issue hasn't been completely resolved.

Thanks