Created attachment 169878 [details]
dmesg output after "service netif restart wlan0" and logging in
Over the last several months, as I reboot my laptop to the "head" slice for its daily update-in-place (by source rebuild), I've noticed that while the wlano NIC (using iwn hardware) almost always "Just Works" when I boot stable/10, in head, it's about a 50% probability that wpa_supplicant will start, bring up wlan0, and keep link up after dhclient starts -- vs. dropping link once dhclient starts, only to have dhclient continue to re-try, even though wlan0's link has dropped.
In the latter case, I have found a circumvention (of sorts): After dhclient finally times out and gives up, login (as root) on ttyv1, then issue "service netif restart wlan0". In my experience, this always makes it work again. (As I use xdm, started via getty out of /etc/ttys, I then also issue "service xdm stop".)
Per suggestion, I have augmented /etc/rc.conf with 'wlandebug_wlan0="state+scan+auth+assoc+crypto"' ... and while I did not encounter the above-described issue on the first "head" boot this morning, I did on the second, when the laptop was running:
FreeBSD g1-252.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #406 r298919M/298920:1100106: Mon May 2 04:42:04 PDT 2016 firstname.lastname@example.org:/common/S4/obj/usr/src/sys/CANARY amd64
For each of the "head" boots this morning, I booted verbosely; I have retained a few log files from the second boot (where link was dropped), and managed to attach one of them. (I didn't see a way to attach multiple files -- other than concatenating them into one single mess, which seemed a bit counterproductive.)
Note that the SSID of interest to me is "lmdhw-net".
Created attachment 169879 [details]
/var/log/messages from the boot in question
Created attachment 169880 [details]
/var/log/console.log from the boot in question
r298925 may help with initial DOWN->UP->DOWN transitions; however
> May 2 11:50:51 kernel: wlan0: scan_curchan_task: loop start; scandone=0
> May 2 11:50:51 kernel: wlan0: scan_curchan_task: chan 11n -> 1g [active, dwell min 20ms max 200ms]
> May 2 11:50:51 kernel: wlan0: scan_curchan_task: waiting
looks like a driver misconfiguration; for now, https://reviews.freebsd.org/D6176 may workaround this issue (until more proper fix will be found).
Thanks for the suggestion, Andriy. I didn't try it for the first few days, as I wasn't seeing the failure (so it wouldn't have been possible for me to tell if the change had a useful effect).
After getting failures again yesterday and the day before, I verified that the diff will apply (and just noticed that I left the "--dry-run" in there, so I didn't apply it yet. :-( ). Sigh. (And I didn't get failures today, anyway.)
Created attachment 170225 [details]
Verbose-boot dmesg from head/amd64@r299529M
Created attachment 170226 [details]
/var/log/messages from head/amd64@r299529M
OK; had a recurrence of the symptoms after applying the D6178 patch for yesterday's rebuild of head (after a couple of "normal" reboots of head with that patch).
Today's upgrade was from:
FreeBSD g1-252.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #414 r299423M/299425:1100107: Wed May 11 04:43:40 PDT 2016 email@example.com:/common/S4/obj/usr/src/sys/CANARY amd64
FreeBSD g1-252.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #415 r299529M/299529:1100107: Thu May 12 05:33:45 PDT 2016 firstname.lastname@example.org:/common/S4/obj/usr/src/sys/CANARY amd64
I've saved /var/log/messages, as well as what I have of dmesg (from a verbose boot), and I managed to attach them.
A commit references this bug:
Date: Sat May 21 23:21:42 UTC 2016
New revision: 300383
net80211: send RTM_IEEE80211_SCAN event when scan was cancelled.
wpa_supplicant(8) expects to see 'scan complete' event after every
scan command; in case, when event is not sent it will hang for
Now workaround should work.
A commit references this bug:
Date: Thu May 26 11:12:36 UTC 2016
New revision: 300732
iwn: add watchdog for scanning.
Restart device if scanning was not done in time.
Tested by: email@example.com
Differential Revision: https://reviews.freebsd.org/D6176
There is a commit referencing this PR, but it's still not closed and has been inactive for some time. Closing the PR as fixed but feel free to re-open it if the issue hasn't been completely resolved.