Created attachment 169878 [details] dmesg output after "service netif restart wlan0" and logging in Over the last several months, as I reboot my laptop to the "head" slice for its daily update-in-place (by source rebuild), I've noticed that while the wlano NIC (using iwn hardware) almost always "Just Works" when I boot stable/10, in head, it's about a 50% probability that wpa_supplicant will start, bring up wlan0, and keep link up after dhclient starts -- vs. dropping link once dhclient starts, only to have dhclient continue to re-try, even though wlan0's link has dropped. In the latter case, I have found a circumvention (of sorts): After dhclient finally times out and gives up, login (as root) on ttyv1, then issue "service netif restart wlan0". In my experience, this always makes it work again. (As I use xdm, started via getty out of /etc/ttys, I then also issue "service xdm stop".) Per suggestion, I have augmented /etc/rc.conf with 'wlandebug_wlan0="state+scan+auth+assoc+crypto"' ... and while I did not encounter the above-described issue on the first "head" boot this morning, I did on the second, when the laptop was running: FreeBSD g1-252.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #406 r298919M/298920:1100106: Mon May 2 04:42:04 PDT 2016 root@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY amd64 For each of the "head" boots this morning, I booted verbosely; I have retained a few log files from the second boot (where link was dropped), and managed to attach one of them. (I didn't see a way to attach multiple files -- other than concatenating them into one single mess, which seemed a bit counterproductive.) Note that the SSID of interest to me is "lmdhw-net".
Created attachment 169879 [details] /var/log/messages from the boot in question
Created attachment 169880 [details] /var/log/console.log from the boot in question
r298925 may help with initial DOWN->UP->DOWN transitions; however > May 2 11:50:51 kernel: wlan0: scan_curchan_task: loop start; scandone=0 > May 2 11:50:51 kernel: wlan0: scan_curchan_task: chan 11n -> 1g [active, dwell min 20ms max 200ms] > May 2 11:50:51 kernel: wlan0: scan_curchan_task: waiting (hangs here) looks like a driver misconfiguration; for now, https://reviews.freebsd.org/D6176 may workaround this issue (until more proper fix will be found).
Thanks for the suggestion, Andriy. I didn't try it for the first few days, as I wasn't seeing the failure (so it wouldn't have been possible for me to tell if the change had a useful effect). After getting failures again yesterday and the day before, I verified that the diff will apply (and just noticed that I left the "--dry-run" in there, so I didn't apply it yet. :-( ). Sigh. (And I didn't get failures today, anyway.)
Created attachment 170225 [details] Verbose-boot dmesg from head/amd64@r299529M
Created attachment 170226 [details] /var/log/messages from head/amd64@r299529M
OK; had a recurrence of the symptoms after applying the D6178 patch for yesterday's rebuild of head (after a couple of "normal" reboots of head with that patch). Today's upgrade was from: FreeBSD g1-252.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #414 r299423M/299425:1100107: Wed May 11 04:43:40 PDT 2016 root@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY amd64 to: FreeBSD g1-252.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #415 r299529M/299529:1100107: Thu May 12 05:33:45 PDT 2016 root@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY amd64 I've saved /var/log/messages, as well as what I have of dmesg (from a verbose boot), and I managed to attach them.
A commit references this bug: Author: avos Date: Sat May 21 23:21:42 UTC 2016 New revision: 300383 URL: https://svnweb.freebsd.org/changeset/base/300383 Log: net80211: send RTM_IEEE80211_SCAN event when scan was cancelled. wpa_supplicant(8) expects to see 'scan complete' event after every scan command; in case, when event is not sent it will hang for indefinite time. PR: 209198 Changes: head/sys/net80211/ieee80211_scan_sw.c
Now workaround should work.
A commit references this bug: Author: avos Date: Thu May 26 11:12:36 UTC 2016 New revision: 300732 URL: https://svnweb.freebsd.org/changeset/base/300732 Log: iwn: add watchdog for scanning. Restart device if scanning was not done in time. Tested by: david@catwhisker.org PR: 209198 Differential Revision: https://reviews.freebsd.org/D6176 Changes: head/sys/dev/iwn/if_iwn.c head/sys/dev/iwn/if_iwnvar.h
There is a commit referencing this PR, but it's still not closed and has been inactive for some time. Closing the PR as fixed but feel free to re-open it if the issue hasn't been completely resolved. Thanks