Bug 227602 - [regression] r304629 broke ralink USB hostap mode with authmode WPA2
Summary: [regression] r304629 broke ralink USB hostap mode with authmode WPA2
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: usb (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Many People
Assignee: Hans Petter Selasky
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2018-04-18 05:32 UTC by Damjan Jovanovic
Modified: 2018-11-06 04:14 UTC (History)
3 users (show)

See Also:
koobs: mfc-stable11+
koobs: mfc-stable10+


Attachments
usbdump log with r304629 (95.22 KB, application/octet-stream)
2018-04-27 08:50 UTC, Damjan Jovanovic
no flags Details
usbdump log without r304629 (103.79 KB, application/octet-stream)
2018-04-27 08:59 UTC, Damjan Jovanovic
no flags Details
usbdump log of the rare case when it works with r304629 (104.31 KB, application/octet-stream)
2018-04-29 09:00 UTC, Damjan Jovanovic
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Damjan Jovanovic 2018-04-18 05:32:40 UTC
My WiFi USB dongle, using the ralink driver, works reproducibly and reliably in hostap mode in 10.3, but never works in CURRENT, giving an error of "run0: could not load 8051 microcode" when hostapd starts, and becoming invisible to WiFi clients.

I've confirmed it works well in GitHub commit 74ee552c5dacc20b6dde64cbb8a44e8c8ce975d0 (from around the 10.3 release), and have begun bisecting for the regression.

In CURRENT it does work in authmode "open", but not with WPA2. The microcode loads the first time, but doesn't load the second time when hostapd starts. It loads successfully both times in 74ee552c5dacc20b6dde64cbb8a44e8c8ce975d0.


How I test it:

# ifconfig wlan0 create wlandev run0 wlanmode hostap
# ifconfig wlan0 inet 192.168.0.1 netmask 255.255.255.0 ssid home
# service hostapd start


/etc/rc.conf:

hostapd_enable="YES"


/etc/hostapd.conf:

interface=wlan0
debug=1
ctrl_interface=/var/run/hostapd
ctrl_interface_group=wheel
ssid=home
wpa=2
wpa_passphrase=abc123
wpa_key_mgmt=WPA-PSK
wpa_pairwise=CCMP


My USB dongle:

# usbconfig dump_device_desc

ugen0.5: <Ralink 802.11 n WLAN> at usbus0, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (450mA)

  bLength = 0x0012 
  bDescriptorType = 0x0001 
  bcdUSB = 0x0200 
  bDeviceClass = 0x0000  <Probed by interface class>
  bDeviceSubClass = 0x0000 
  bDeviceProtocol = 0x0000 
  bMaxPacketSize0 = 0x0040 
  idVendor = 0x148f 
  idProduct = 0x3070 
  bcdDevice = 0x0101 
  iManufacturer = 0x0001  <Ralink>
  iProduct = 0x0002  <802.11 n WLAN>
  iSerialNumber = 0x0003  <1.0>
  bNumConfigurations = 0x0001
Comment 1 Damjan Jovanovic 2018-04-23 20:32:07 UTC
I bisected this regression to SVN commit 304629, which is this commit in GitHub:

b33c688c747fa7cc20a9f21b3e6f4e137647a4cc is the first bad commit
commit b33c688c747fa7cc20a9f21b3e6f4e137647a4cc
Author: hselasky <hselasky@FreeBSD.org>
Date:   Mon Aug 22 19:32:50 2016 +0000

    Don't separate the status stage of the XHCI USB control transfers into
    its own job because this breaks the simplified QEMU XHCI TRB parser,
    which expects the complete USB control transfer as a series of back to
    back TRBs. The old behaviour is kept under #ifdef in case this change
    breaks enumeration of any USB devices.
    
    PR:             212021
    MFC after:      1 week

:040000 040000 8fc5db306b826d39d5b25322b5f8f4aae71cd450 05ffa3294c4cbbfb29a04bcf29a53f305313a937 M	sys



Upgrading importance and adding author to CC.
Comment 2 Damjan Jovanovic 2018-04-27 05:29:16 UTC
Undoing the bad commit (304629) on top of CURRENT gets my WiFi dongle working again.

Can someone please find a permanent fix for this bug though? It must affect other USB devices. It also affects the 11.1 release, if not more of them.
Comment 3 Hans Petter Selasky freebsd_committer freebsd_triage 2018-04-27 07:01:00 UTC
Hi,

Can you collect "usbdump -i usbusX -f Y -w log.pcap" traces when plugging your device in both cases, w/ and w/o reverting r304629 ?

Plug your device one time, and record X.Y numbers after ugen.

Unplug your device and start usbdump as shown above. Then plug device again, and save the log after a few seconds.

Upload binary logs to this PR.

--HPS
Comment 4 Hans Petter Selasky freebsd_committer freebsd_triage 2018-04-27 07:07:04 UTC
Beware that this might be a bug in your USB device, that it only works with EHCI and not XHCI.
Comment 5 Damjan Jovanovic 2018-04-27 08:50:56 UTC
Created attachment 192839 [details]
usbdump log with r304629

Thank you so much. Here is the usbdump taken at commit r304629 itself, where I got that "could not load microcode" error.
Comment 6 Damjan Jovanovic 2018-04-27 08:59:06 UTC
Created attachment 192840 [details]
usbdump log without r304629

Here is the usbdump taken on CURRENT with commit r304629 reverted, where hostapd is working and my phone successfully connected.

I think only the 2 blue USB ports on the back of my PC are USB 3. The dongle was only tested with a black USB port on the front.
Comment 7 Hans Petter Selasky freebsd_committer freebsd_triage 2018-04-28 10:02:07 UTC
Hi,

In the log-with-r304629.pcap I clearly see ERR=IOERROR which is not present in the other log file.

I suspect that your XHCI controller needs a quirk, I.E. the old behaviour.

If I were to guess, the problem is that your XHCI controller executes the status stage instantly after the so-called USB data stage, because the patch make the DMA jobs back to back. The ralink dongle doesn't handle this and IOERROR usually means no response. Typically this patch makes the status stage execute after the next SOF. This should be handled automagically by the XHCI controller.

Can you post the output of "pciconf -lv" for your XHCI PCI device?

I guess we can make the #ifdef into a system tunable, and then you can set it in /boot/loader.conf ?

--HPS
Comment 8 Damjan Jovanovic 2018-04-29 06:23:13 UTC
(In reply to Hans Petter Selasky from comment #7)

Thank you for your help. Here is my pciconf -lv:

hostb0@pci0:0:0:0:	class=0x060000 card=0x50001458 chip=0x0c008086 rev=0x06 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '4th Gen Core Processor DRAM Controller'
    class      = bridge
    subclass   = HOST-PCI
pcib1@pci0:0:1:0:	class=0x060400 card=0x50001458 chip=0x0c018086 rev=0x06 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = 'Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller'
    class      = bridge
    subclass   = PCI-PCI
xhci0@pci0:0:20:0:	class=0x0c0330 card=0x50071458 chip=0x8c318086 rev=0x05 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '8 Series/C220 Series Chipset Family USB xHCI'
    class      = serial bus
    subclass   = USB
none0@pci0:0:22:0:	class=0x078000 card=0x1c3a1458 chip=0x8c3a8086 rev=0x04 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '8 Series/C220 Series Chipset Family MEI Controller'
    class      = simple comms
ehci0@pci0:0:26:0:	class=0x0c0320 card=0x50061458 chip=0x8c2d8086 rev=0x05 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '8 Series/C220 Series Chipset Family USB EHCI'
    class      = serial bus
    subclass   = USB
hdac1@pci0:0:27:0:	class=0x040300 card=0xa0021458 chip=0x8c208086 rev=0x05 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '8 Series/C220 Series Chipset High Definition Audio Controller'
    class      = multimedia
    subclass   = HDA
pcib2@pci0:0:28:0:	class=0x060400 card=0x50011458 chip=0x8c108086 rev=0xd5 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = '8 Series/C220 Series Chipset Family PCI Express Root Port'
    class      = bridge
    subclass   = PCI-PCI
pcib3@pci0:0:28:2:	class=0x060400 card=0x50011458 chip=0x8c148086 rev=0xd5 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = '8 Series/C220 Series Chipset Family PCI Express Root Port'
    class      = bridge
    subclass   = PCI-PCI
pcib4@pci0:0:28:3:	class=0x060401 card=0x50011458 chip=0x244e8086 rev=0xd5 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = '82801 PCI Bridge'
    class      = bridge
    subclass   = PCI-PCI
ehci1@pci0:0:29:0:	class=0x0c0320 card=0x50061458 chip=0x8c268086 rev=0x05 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '8 Series/C220 Series Chipset Family USB EHCI'
    class      = serial bus
    subclass   = USB
isab0@pci0:0:31:0:	class=0x060100 card=0x50011458 chip=0x8c508086 rev=0x05 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'B85 Express LPC Controller'
    class      = bridge
    subclass   = PCI-ISA
ahci0@pci0:0:31:2:	class=0x010601 card=0xb0051458 chip=0x8c028086 rev=0x05 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode]'
    class      = mass storage
    subclass   = SATA
none1@pci0:0:31:3:	class=0x0c0500 card=0x50011458 chip=0x8c228086 rev=0x05 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '8 Series/C220 Series Chipset Family SMBus Controller'
    class      = serial bus
    subclass   = SMBus
vgapci0@pci0:1:0:0:	class=0x030000 card=0x00000000 chip=0x0f0110de rev=0xa1 hdr=0x00
    vendor     = 'NVIDIA Corporation'
    device     = 'GF108 [GeForce GT 620]'
    class      = display
    subclass   = VGA
hdac0@pci0:1:0:1:	class=0x040300 card=0x00000000 chip=0x0e0810de rev=0xa1 hdr=0x00
    vendor     = 'NVIDIA Corporation'
    device     = 'GF119 HDMI Audio Controller'
    class      = multimedia
    subclass   = HDA
re0@pci0:3:0:0:	class=0x020000 card=0xe0001458 chip=0x816810ec rev=0x06 hdr=0x00
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet
pcib5@pci0:4:0:0:	class=0x060401 card=0x88921458 chip=0x244e8086 rev=0x41 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = '82801 PCI Bridge'
    class      = bridge
    subclass   = PCI-PCI
Comment 9 Damjan Jovanovic 2018-04-29 06:42:54 UTC
Apparently r304629 was added because qemu needed these back to back DMA jobs for its emulated XHCI controller (in bug 212021). But qemu is cross platform, just like my USB dongle. How does Windows get both of them to work? How does Linux? Other BSDs? Do they all use these back to back DMA jobs, or does only FreeBSD use them?

Also I can - very rarely - get the WiFi dongle to work even with r304629 applied. It only works 19.67% of the time. Could there be some kind of timing issue or race condition, that is being exacerbated by r304629? Let me try get a usbdump of one of these rare cases. I think it should be very revealing.
Comment 10 Damjan Jovanovic 2018-04-29 09:00:58 UTC
Created attachment 192894 [details]
usbdump log of the rare case when it works with r304629

Here is the usbdump of the rare case when it fully works at commit r304629 itself. Does it reveal anything useful, compared to the broken case in attachment 192839 [details]?
Comment 11 Hans Petter Selasky freebsd_committer freebsd_triage 2018-04-29 09:47:11 UTC
> How does Linux? 

Linux is more good at adding quirks when needed.

This does not only affect QEMU, but also VMWARE and all kinds of hypervisors.

--HPS
Comment 12 Hans Petter Selasky freebsd_committer freebsd_triage 2018-04-29 09:50:29 UTC
Hi,

The XHCI controller you have already has some quirks in FreeBSD, if you look at sys/dev/usb/controller/xhci_pci.c:

        case 0x8c318086:        /* Lynx Point */

I'll add another one if I find some time later today for this issue.

--HPS
Comment 13 commit-hook freebsd_committer freebsd_triage 2018-04-30 07:30:45 UTC
A commit references this bug:

Author: hselasky
Date: Mon Apr 30 07:30:38 UTC 2018
New revision: 333100
URL: https://svnweb.freebsd.org/changeset/base/333100

Log:
  Improve fix in r304629 by allowing configuration of the behaviour
  through a SYSCTL instead of a compile time define.

  Add quirk by default for all LynxPoint XHCI controllers.

  PR:		227602
  MFC after:	3 days
  Sponsored by:	Mellanox Technologies

Changes:
  head/sys/dev/usb/controller/xhci.c
  head/sys/dev/usb/controller/xhci.h
  head/sys/dev/usb/controller/xhci_pci.c
Comment 14 Hans Petter Selasky freebsd_committer freebsd_triage 2018-04-30 07:31:22 UTC
Please try the patch submitted to head.

Let me know if it doesn't work.
Comment 15 commit-hook freebsd_committer freebsd_triage 2018-05-03 07:29:52 UTC
A commit references this bug:

Author: hselasky
Date: Thu May  3 07:29:08 UTC 2018
New revision: 333199
URL: https://svnweb.freebsd.org/changeset/base/333199

Log:
  MFC r333100:
  Improve fix in r304629 by allowing configuration of the behaviour
  through a SYSCTL instead of a compile time define.

  Add quirk by default for all LynxPoint XHCI controllers.

  PR:		227602
  Sponsored by:	Mellanox Technologies

Changes:
_U  stable/11/
  stable/11/sys/dev/usb/controller/xhci.c
  stable/11/sys/dev/usb/controller/xhci.h
  stable/11/sys/dev/usb/controller/xhci_pci.c
Comment 16 commit-hook freebsd_committer freebsd_triage 2018-05-03 07:39:02 UTC
A commit references this bug:

Author: hselasky
Date: Thu May  3 07:38:46 UTC 2018
New revision: 333203
URL: https://svnweb.freebsd.org/changeset/base/333203

Log:
  MFC r333100:
  Improve fix in r304629 by allowing configuration of the behaviour
  through a SYSCTL instead of a compile time define.

  Add quirk by default for all LynxPoint XHCI controllers.

  PR:		227602
  Sponsored by:	Mellanox Technologies

Changes:
_U  stable/10/
  stable/10/sys/dev/usb/controller/xhci.c
  stable/10/sys/dev/usb/controller/xhci.h
  stable/10/sys/dev/usb/controller/xhci_pci.c
Comment 17 Damjan Jovanovic 2018-05-16 18:13:31 UTC
It works with that commit. Thank you so much!

          .-'"""""'-.        
        .'           `.    
       /   .      .    \   
      :                 : 
      |                 |  
      :   \        /    :    
       \   `.____.'    /            
        `.           .'     
          `-._____.-'