Bug 211520

Summary: iwn controller panic in 11.0-BETA3
Product: Base System Reporter: Randy Westlund <rwestlun>
Component: wirelessAssignee: freebsd-wireless mailing list <wireless>
Status: New ---    
Severity: Affects Only Me CC: david, lme
Priority: ---    
Version: 11.0-BETA3   
Hardware: amd64   
OS: Any   

Description Randy Westlund 2016-08-02 04:43:21 UTC
My system hung during normal usage while using wifi (graphical display still
up, but nothing responding).  After rebooting, this was in /var/log/messages:

Aug  2 00:25:13 mako kernel: iwn0: iwn_intr: fatal firmware error
Aug  2 00:25:13 mako kernel: firmware error log:
Aug  2 00:25:13 mako kernel: error type      = "UNKNOWN" (0x00001047)
Aug  2 00:25:13 mako kernel: program counter = 0x000043F0
Aug  2 00:25:13 mako kernel: source line     = 0x00001423
Aug  2 00:25:13 mako kernel: error data      = 0x0010A00080102087
Aug  2 00:25:13 mako kernel: branch link     = 0x000043F2000043F2
Aug  2 00:25:13 mako kernel: interrupt link  = 0x0000D6BE00000000
Aug  2 00:25:13 mako kernel: time            = 1417374444
Aug  2 00:25:13 mako kernel: driver status:
Aug  2 00:25:13 mako kernel: tx ring  0: qid=0  cur=112 queued=0
Aug  2 00:25:13 mako kernel: tx ring  1: qid=1  cur=0   queued=0
Aug  2 00:25:13 mako kernel: tx ring  2: qid=2  cur=136 queued=1
Aug  2 00:25:13 mako kernel: tx ring  3: qid=3  cur=41  queued=0
Aug  2 00:25:13 mako kernel: tx ring  4: qid=4  cur=0   queued=0
Aug  2 00:25:13 mako kernel: tx ring  5: qid=5  cur=0   queued=0
Aug  2 00:25:13 mako kernel: tx ring  6: qid=6  cur=0   queued=0
Aug  2 00:25:13 mako kernel: tx ring  7: qid=7  cur=0   queued=0
Aug  2 00:25:13 mako kernel: tx ring  8: qid=8  cur=0   queued=0
Aug  2 00:25:13 mako kernel: tx ring  9: qid=9  cur=145 queued=0
Aug  2 00:25:13 mako kernel: tx ring 10: qid=10 cur=251 queued=2
Aug  2 00:25:13 mako kernel: tx ring 11: qid=11 cur=0   queued=0
Aug  2 00:25:13 mako kernel: tx ring 12: qid=12 cur=0   queued=0
Aug  2 00:25:13 mako kernel: tx ring 13: qid=13 cur=0   queued=0
Aug  2 00:25:13 mako kernel: tx ring 14: qid=14 cur=0   queued=0
Aug  2 00:25:13 mako kernel: tx ring 15: qid=15 cur=0   queued=0
Aug  2 00:25:13 mako kernel: tx ring 16: qid=16 cur=0   queued=0
Aug  2 00:25:13 mako kernel: tx ring 17: qid=17 cur=0   queued=0
Aug  2 00:25:13 mako kernel: tx ring 18: qid=18 cur=0   queued=0
Aug  2 00:25:13 mako kernel: tx ring 19: qid=19 cur=0   queued=0
Aug  2 00:25:13 mako kernel: rx ring: cur=53
Aug  2 00:25:13 mako kernel: iwn0: iwn_panicked: controller panicked, iv_state = 5; restarting
Aug  2 00:25:13 mako kernel: wlan0: link state changed to DOWN
Aug  2 00:25:13 mako kernel: lagg0: link state changed to DOWN
Aug  2 00:25:13 mako wpa_supplicant[366]: wlan0: CTRL-EVENT-DISCONNECTED bssid=c4:04:15:90:f5:fd reason=0
Aug  2 00:25:13 mako wpa_supplicant[366]: ioctl[SIOCS80211, op=20, val=0, arg_len=7]: Can't assign requested address
Aug  2 00:25:13 mako kernel: iwn0: iwn_read_firmware: ucode rev=0x12a80601
Aug  2 00:25:15 mako wpa_supplicant[366]: wlan0: Trying to associate with c4:04:15:90:f5:fd (SSID='neural_network' freq=2412 MHz)
Aug  2 00:25:15 mako wpa_supplicant[366]: wlan0: Associated with c4:04:15:90:f5:fd
Aug  2 00:25:15 mako kernel: wlan0: link state changed to UP
Aug  2 00:25:15 mako kernel: lagg0: link state changed to UP
Aug  2 00:25:15 mako dhclient[732]: send_packet: No buffer space available
Aug  2 00:25:15 mako wpa_supplicant[366]: wlan0: WPA: Key negotiation completed with c4:04:15:90:f5:fd [PTK=CCMP GTK=CCMP]
Aug  2 00:25:15 mako wpa_supplicant[366]: wlan0: CTRL-EVENT-CONNECTED - Connection to c4:04:15:90:f5:fd completed [id=2 id_str=]


I'm running 11.0-BETA3, amd64.  Network device:

iwn0@pci0:3:0:0:        class=0x028000 card=0x13118086 chip=0x00858086 rev=0x34 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Centrino Advanced-N 6205 [Taylor Peak]'
    class      = network
Comment 1 Randy Westlund 2016-08-02 04:51:43 UTC
And here's the relevant ifconfig output for good measure. It's in a failover lagg with em0.

wlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether 3c:97:0e:46:70:ca
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: IEEE 802.11 Wireless Ethernet MCS mode 11ng
        status: associated
        ssid neural_network channel 1 (2412 MHz 11g ht/40+) bssid c4:04:15:90:f5:fd
        regdomain FCC country US authmode WPA2/802.11i privacy ON
        deftxkey UNDEF AES-CCM 2:128-bit txpower 30 bmiss 10 scanvalid 60
        protmode CTS ampdulimit 64k ampdudensity 8 -amsdutx amsdurx shortgi
        -stbc wme roaming MANUAL
        groups: wlan
Comment 2 Lars Engels freebsd_committer 2016-08-02 06:16:45 UTC
Same here with iwn and em.

It's easy to reproduce. Just run "service netif restart" and you get an insta-freeze. But I don't see the firmware error messages in /var/log/messages
Unfortunately nothing is shown on the console when the system freezes.
Comment 3 Randy Westlund 2016-08-02 06:21:46 UTC
Sounds like a different thing; I can restart networking without trouble. It's only frozen on me once, after several hours for working just fine.  I don't know what triggered it.
Comment 4 david 2018-08-12 12:30:03 UTC
I am seeing something similar (to the original report) in:
FreeBSD g1-215.catwhisker.org 12.0-ALPHA1 FreeBSD 12.0-ALPHA1 #268  r337568M/337568:1200076: Fri Aug 10 05:27:01 PDT 2018     root@g1-215.catwhisker.org:/common/S3/obj/usr/src/amd64.amd64/sys/CANARY  amd64


Here's an excerpt from /var/log/messages showing the most recent one as of the moment:

<13>1 2018-08-12T11:57:19.406030+00:00 localhost dhclient 58253 - - Exiting /etc/dhclient-exit-hooks (REBOOT) with exit_status 0
<29>1 2018-08-12T11:57:21.959167+00:00 localhost wpa_supplicant 57362 - - wlan0: CTRL-EVENT-DISCONNECTED bssid=04:18:d6:22:22:1f reason=0
<2>1 2018-08-12T11:57:21.959314+00:00 localhost kernel - - - iwn0: iwn_intr: fatal firmware error
<2>1 2018-08-12T11:57:21.959335+00:00 localhost kernel - - - firmware error log:
<2>1 2018-08-12T11:57:21.959352+00:00 localhost kernel - - -   error type      = "UNKNOWN" (0x0000001D)
<2>1 2018-08-12T11:57:21.959368+00:00 localhost kernel - - -   program counter = 0x000006B4
<2>1 2018-08-12T11:57:21.959384+00:00 localhost kernel - - -   source line     = 0x00007B96
<2>1 2018-08-12T11:57:21.959399+00:00 localhost kernel - - -   error data      = 0x0000000203630000
<2>1 2018-08-12T11:57:21.959414+00:00 localhost kernel - - -   branch link     = 0x0000067A000007B8
<2>1 2018-08-12T11:57:21.959430+00:00 localhost kernel - - -   interrupt link  = 0x0000153200006874
<2>1 2018-08-12T11:57:21.959446+00:00 localhost kernel - - -   time            = 3517494799
<2>1 2018-08-12T11:57:21.959461+00:00 localhost kernel - - - driver status:
<2>1 2018-08-12T11:57:21.959477+00:00 localhost kernel - - -   tx ring  0: qid=0  cur=140 queued=2  
<2>1 2018-08-12T11:57:21.959492+00:00 localhost kernel - - -   tx ring  1: qid=1  cur=0   queued=0  
<2>1 2018-08-12T11:57:21.959508+00:00 localhost kernel - - -   tx ring  2: qid=2  cur=0   queued=0  
<2>1 2018-08-12T11:57:21.959524+00:00 localhost kernel - - -   tx ring  3: qid=3  cur=4   queued=0  
<2>1 2018-08-12T11:57:21.959539+00:00 localhost kernel - - -   tx ring  4: qid=4  cur=55  queued=0  
<2>1 2018-08-12T11:57:21.959555+00:00 localhost kernel - - -   tx ring  5: qid=5  cur=0   queued=0  
<2>1 2018-08-12T11:57:21.959572+00:00 localhost kernel - - -   tx ring  6: qid=6  cur=0   queued=0  
<2>1 2018-08-12T11:57:21.959589+00:00 localhost kernel - - -   tx ring  7: qid=7  cur=0   queued=0  
<2>1 2018-08-12T11:57:21.959605+00:00 localhost kernel - - -   tx ring  8: qid=8  cur=0   queued=0  
<2>1 2018-08-12T11:57:21.959620+00:00 localhost kernel - - -   tx ring  9: qid=9  cur=0   queued=0  
<2>1 2018-08-12T11:57:21.959640+00:00 localhost kernel - - -   tx ring 10: qid=10 cur=0   queued=0  
<2>1 2018-08-12T11:57:21.959656+00:00 localhost kernel - - -   tx ring 11: qid=11 cur=0   queued=0  
<2>1 2018-08-12T11:57:21.959672+00:00 localhost kernel - - -   tx ring 12: qid=12 cur=0   queued=0  
<2>1 2018-08-12T11:57:21.959688+00:00 localhost kernel - - -   tx ring 13: qid=13 cur=0   queued=0  
<2>1 2018-08-12T11:57:21.959704+00:00 localhost kernel - - -   tx ring 14: qid=14 cur=0   queued=0  
<2>1 2018-08-12T11:57:21.959722+00:00 localhost kernel - - -   tx ring 15: qid=15 cur=0   queued=0  
<2>1 2018-08-12T11:57:21.959737+00:00 localhost kernel - - -   tx ring 16: qid=16 cur=0   queued=0  
<2>1 2018-08-12T11:57:21.959752+00:00 localhost kernel - - -   tx ring 17: qid=17 cur=0   queued=0  
<2>1 2018-08-12T11:57:21.959768+00:00 localhost kernel - - -   tx ring 18: qid=18 cur=0   queued=0  
<2>1 2018-08-12T11:57:21.959784+00:00 localhost kernel - - -   tx ring 19: qid=19 cur=0   queued=0  
<2>1 2018-08-12T11:57:21.959802+00:00 localhost kernel - - -   rx ring: cur=34
<2>1 2018-08-12T11:57:21.959819+00:00 localhost kernel - - - iwn0: iwn_panicked: controller panicked, iv_state = 5; restarting
<6>1 2018-08-12T11:57:21.959836+00:00 localhost kernel - - - wlan0: link state changed to DOWN
<27>1 2018-08-12T11:57:21.959260+00:00 localhost wpa_supplicant 57362 - - ioctl[SIOCS80211, op=20, val=0, arg_len=7]: Can't assign requested address
<2>1 2018-08-12T11:57:21.960898+00:00 localhost kernel - - - iwn0: iwn_read_firmware: ucode rev=0x09dd0401
<29>1 2018-08-12T11:57:22.950315+00:00 localhost wpa_supplicant 57362 - - wlan0: Trying to associate with 04:18:d6:22:22:1f (SSID='lmdhw-net' freq=2462 MHz)
<29>1 2018-08-12T11:57:22.959678+00:00 localhost wpa_supplicant 57362 - - wlan0: Associated with 04:18:d6:22:22:1f
<6>1 2018-08-12T11:57:22.959810+00:00 localhost kernel - - - wlan0: link state changed to UP
<27>1 2018-08-12T11:57:22.960136+00:00 localhost dhclient 57392 - - send_packet: No buffer space available
<29>1 2018-08-12T11:57:23.968790+00:00 localhost wpa_supplicant 57362 - - wlan0: WPA: Key negotiation completed with 04:18:d6:22:22:1f [PTK=CCMP GTK=CCMP]
<29>1 2018-08-12T11:57:23.968875+00:00 localhost wpa_supplicant 57362 - - wlan0: CTRL-EVENT-CONNECTED - Connection to 04:18:d6:22:22:1f completed [id=27 id_str=]
<13>1 2018-08-12T11:57:29.969299+00:00 localhost dhclient 58293 - - /etc/dhclient-enter-hooks invoked with reason REBOOT


I have the impression that trying to pass more traffic through the NIC increases the frequency with which this occurs; for me, it's usually about 6 - 10 per hour.

pciconf says:
iwn0@pci0:3:0:0:        class=0x028000 card=0x11218086 chip=0x422b8086 rev=0x35 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Centrino Ultimate-N 6300'
    class      = network

And ifconfig -v:
wlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether 3c:a9:f4:a2:ca:f0
        inet 172.17.1.215 netmask 0xffff0000 broadcast 172.17.255.255 
        groups: wlan 
        ssid lmdhw-net channel 11 (2462 MHz 11g) bssid 04:18:d6:22:22:1f
        regdomain FCC country US anywhere -ecm authmode WPA2/802.11i -wps
        -tsn privacy ON deftxkey UNDEF
        AES-CCM 2:128-bit powersavemode OFF powersavesleep 100 txpower 30
        txpowmax 50.0 -dotd rtsthreshold 2346 fragthreshold 2346 bmiss 10
        11a     ucast NONE    mgmt  6 Mb/s mcast  6 Mb/s maxretry 6
        11b     ucast NONE    mgmt  1 Mb/s mcast  1 Mb/s maxretry 6
        11g     ucast NONE    mgmt  1 Mb/s mcast  1 Mb/s maxretry 6
        turboA  ucast NONE    mgmt  6 Mb/s mcast  6 Mb/s maxretry 6
        turboG  ucast NONE    mgmt  1 Mb/s mcast  1 Mb/s maxretry 6
        sturbo  ucast NONE    mgmt  6 Mb/s mcast  6 Mb/s maxretry 6
        11na    ucast NONE    mgmt 12 MCS  mcast 12 MCS  maxretry 6
        11ng    ucast NONE    mgmt  2 MCS  mcast  2 MCS  maxretry 6
        half    ucast NONE    mgmt  3 Mb/s mcast  3 Mb/s maxretry 6
        quarter ucast NONE    mgmt  1 Mb/s mcast  1 Mb/s maxretry 6
        11acg   ucast NONE    mgmt  1 Mb/s mcast  1 Mb/s maxretry 6
        11ac    ucast NONE    mgmt  6 Mb/s mcast  6 Mb/s maxretry 6
        scanvalid 60 -bgscan bgscanintvl 300 bgscanidle 250
        roam:11a     rssi    7dBm rate 12 Mb/s
        roam:11b     rssi    7dBm rate  1 Mb/s
        roam:11g     rssi    7dBm rate  5 Mb/s
        roam:turboA  rssi    7dBm rate 12 Mb/s
        roam:turboG  rssi    7dBm rate 12 Mb/s
        roam:sturbo  rssi    7dBm rate 12 Mb/s
        roam:11na    rssi    7dBm  MCS  1    
        roam:11ng    rssi    7dBm  MCS  1    
        roam:half    rssi    7dBm rate  6 Mb/s
        roam:quarter rssi    7dBm rate  3 Mb/s
        roam:11acg   rssi    7dBm rate 64 Mb/s
        roam:11ac    rssi    7dBm rate 64 Mb/s
        -pureg protmode CTS -ht htcompat ampdu ampdulimit 64k ampdudensity 8
        -amsdutx amsdurx shortgi htprotmode RTSCTS -puren -smps -rifs -stbc
        -ldpc -vht -vht40 -vht80 -vht80p80 -vht160 wme -burst -dwds
        roaming MANUAL bintval 100
        AC_BE cwmin  4 cwmax 10 aifs  3 txopLimit   0 -acm ack
              cwmin  4 cwmax 10 aifs  3 txopLimit   0 -acm
        AC_BK cwmin  4 cwmax 10 aifs  7 txopLimit   0 -acm ack
              cwmin  4 cwmax 10 aifs  7 txopLimit   0 -acm
        AC_VI cwmin  3 cwmax  4 aifs  2 txopLimit  94 -acm ack
              cwmin  3 cwmax  4 aifs  2 txopLimit  94 -acm
        AC_VO cwmin  2 cwmax  3 aifs  2 txopLimit  47 -acm ack
              cwmin  2 cwmax  3 aifs  2 txopLimit  47 -acm
        media: IEEE 802.11 Wireless Ethernet OFDM/54Mbps mode 11g
        status: associated
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

Further: the panics are very rare for me when I'm running stable/11 -- e.g.:
FreeBSD g1-215.catwhisker.org 11.2-STABLE FreeBSD 11.2-STABLE #707  r337682M/337682:1102501: Sun Aug 12 03:47:31 PDT 2018     root@g1-215.catwhisker.org:/common/S1/obj/usr/src/sys/CANARY  amd64

At this point, I believe it's fair to characterize the head (or ALPHA1_ behavior as a "regression" (vs. the stable/11 behavior), at least in my case.
Comment 5 david 2019-02-11 03:47:45 UTC
I'm now seeing these in stable/11, as of r343948.  I was running r343926 yesterday, which had some issues with iwn(4), but I don't know if this issue was involved.  (The laptop was attempting some backups while I was elsewhere.)  I am fairly sure I did not see it often in stable/11 at r343901.

stable/12 and head exhibit the issue so frequently that this is the main reason I haven't migrated my systems from stable/11 to stable/12.
Comment 6 david 2019-02-12 12:41:06 UTC
I was able to run an entire day on stable/11 @r343992 without seeing any of these controller/firmware panics.