Bug 132080 - [patch] [usb] [rum] [panic] Kernel panic after NOMEM caused by rum card
Summary: [patch] [usb] [rum] [panic] Kernel panic after NOMEM caused by rum card
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: usb (show other bugs)
Version: 7.1-STABLE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-usb (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-02-25 01:20 UTC by Alexander Melkov
Modified: 2019-01-13 05:38 UTC (History)
1 user (show)

See Also:


Attachments
file.diff (403 bytes, patch)
2009-02-25 01:20 UTC, Alexander Melkov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Melkov 2009-02-25 01:20:02 UTC
I have <rum> device which is Cisco-Linksys Compact Wireless-G USB Adapter that runs in hostap mode (i.e. wifi access point).
Sometimes it malfunctions (that may happen several times a day), at that moment I have message "rum0: could not transmit buffer: NOMEM" from kernel.
Right after the message kernel crashes upon read from (nearly) null address.

System log:
Feb 23 19:46:22 melkov kernel: rum0: could not transmit buffer: NOMEM
Feb 23 19:46:22 melkov kernel:
Feb 23 19:46:22 melkov kernel:
Feb 23 19:46:22 melkov kernel: Fatal trap 12: page fault while in kernel mode
Feb 23 19:46:22 melkov kernel: fault virtual address    = 0x290
Feb 23 19:46:22 melkov kernel: fault code               = supervisor read data, page not present
Feb 23 19:46:22 melkov kernel: instruction pointer      = 0x8:0xffffffff80430b0d
Feb 23 19:46:22 melkov kernel: stack pointer            = 0x10:0xfffffffef21caa80
Feb 23 19:46:22 melkov kernel: frame pointer            = 0x10:0xffffff00430cd080
Feb 23 19:46:22 melkov kernel: code segment             = base rx0, limit 0xfffff, type 0x1b
Feb 23 19:46:22 melkov kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
Feb 23 19:46:22 melkov kernel: processor eflags = interrupt enabled, resume, IOPL = 0
Feb 23 19:46:22 melkov kernel: current process          = 20 (swi6: Giant taskq)
Feb 23 19:46:22 melkov kernel: trap number              = 12
Feb 23 19:46:22 melkov kernel: panic: page fault

(kgdb) bt 12
#0  doadump () at pcpu.h:195
#1  0x0000000000000004 in ?? ()
#2  0xffffffff804c8fc4 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#3  0xffffffff804c937c in panic (fmt=0xffffffff8081f82b "%s") at /usr/src/sys/kern/kern_shutdown.c:574
#4  0xffffffff8077b48f in trap_fatal (frame=0xffffff0001514000, eva=Variable "eva" is not available.
) at /usr/src/sys/amd64/amd64/trap.c:764
#5  0xffffffff8077b865 in trap_pfault (frame=0xfffffffef21ca9d0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:680
#6  0xffffffff8077c195 in trap (frame=0xfffffffef21ca9d0) at /usr/src/sys/amd64/amd64/trap.c:449
#7  0xffffffff80764fee in calltrap () at /usr/src/sys/amd64/amd64/exception.S:209
#8  0xffffffff80430b0d in usb_transfer_complete (xfer=0xffffff0004d83800) at /usr/src/sys/dev/usb/usbdi.c:949
#9  0xffffffff8043119b in usbd_transfer (xfer=0xffffff0004d83800) at /usr/src/sys/dev/usb/usbdi.c:320
#10 0xffffffff804127d1 in rum_start (ifp=0xffffff0004e12000) at /usr/src/sys/dev/usb/if_rum.c:1360
#11 0xffffffff804fdb60 in taskqueue_run (queue=0xffffff00014d6780) at /usr/src/sys/kern/subr_taskqueue.c:282
(More stack frames follow...)

#8  0xffffffff80430b0d in usb_transfer_complete (xfer=0xffffff0004d83800) at /usr/src/sys/dev/usb/usbdi.c:949
949                     STAILQ_REMOVE_HEAD(&pipe->queue, next);
(kgdb) p pipe->queue
$12 = {stqh_first = 0x0, stqh_last = 0xffffff00430cd0a0}

Apparently there's an attempt to STAILQ_REMOVE_HEAD from an empty pipe->queue, within usb_transfer_complete().

The USBD_NOMEM error code that rum_txeof() complains about probably comes from unsuccessful call to bus_dmamap_create() within usbd_transfer(), which I didn't investigate.

Fix: usbd_start_transfer() has two error-handling branches first of whose called usb_insert_transfer() that ensured the pipe->queue to be non-empty later in usb_transfer_complete().
I've added similar call to the second error-handling branch.

==> usbdi.c.patch <==
How-To-Repeat: Insert ralink-based card into usb slot and set up WPA2 PSK access point according to The Handbook.
Connect to this access point from another station, perform heavy activity, periodically conecting and disconnecting.
Comment 1 Hans Petter Selasky 2009-02-25 07:10:50 UTC
Hi,

The RUM timeouts you are seeing I am aware about. They happen most likely 
because the WLAN channel is set when the device is in the running state on 
the RUM device due to WLAN re-keying or something like that. Maybe you can 
confirm that the time from start of device with heavy download until it gets 
the first timeout is 10minutes?

This issue is actually a RUM firmware issue. If it has frames pending for TX 
and we set the channel, the chip will simply reset or do strange things.

Possible RUM workaround: Set the same WLAN channel only once.

I think Andrew Thompson is working on this. I have some patches in the USB P4 
project for 8-current which fix the problem to a level where TX will 
dissappear for 4 seconds every 10 minutes, but there will be no device 
timeout! The final patch should solve the problem completely, but I need some 
help to figure out when we should ignore set_channel requests.

--HPS
Comment 2 melkov 2009-02-25 09:25:34 UTC
Hello!

> Maybe you can 
> confirm that the time from start of device with heavy download until it gets 
> the first timeout is 10minutes?

I've unplugged and reinserted the usb card, then restarted wifi and started network activity.
There doesn't seem to be any timeout within 21 minutes.

There were 3 consecutive device timeouts yesterday (since my system doesn't reboot any more).
Feb 24 10:42:49 melkov kernel: rum0: could not transmit buffer: NOMEM
Feb 24 10:42:49 melkov kernel: rum0: could not transmit buffer: NOMEM
Feb 24 10:42:54 melkov kernel: rum0: device timeout
Feb 24 10:57:26 melkov kernel: rum0: could not transmit buffer: NOMEM
Feb 24 10:57:26 melkov kernel: rum0: could not transmit buffer: NOMEM
Feb 24 10:57:31 melkov kernel: rum0: device timeout
Feb 24 11:05:02 melkov kernel: rum0: could not transmit buffer: NOMEM
Feb 24 11:05:02 melkov kernel: rum0: could not transmit buffer: NOMEM
Feb 24 11:05:06 melkov kernel: rum0: device timeout

Between the timeouts I've restarted hostapd daemon (manually) to get wifi service working again.
Notably timeout #3 comes 7.5 minutes after #2.

--Alexander
Comment 3 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 07:59:49 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped
Comment 4 Andriy Voskoboinyk freebsd_committer freebsd_triage 2019-01-13 05:38:18 UTC
The code path / message does not exist in the recent driver versions; also, it should be stable enough after base r305544 (I'm using it from time to time; no such problems seen during last few years).