Bug 165622

Summary: [ndis][panic][patch] Unregistered use of FPU in kernel on amd64
Product: Base System Reporter: Vladyslav Movchan <vladislav.movchan>
Component: kernAssignee: Oleksandr Tymoshenko <gonzo>
Status: Closed FIXED    
Severity: Affects Some People CC: AWilcox, avilla, danfe, gonzo, jhb, kib, net
Priority: Normal Keywords: crash, needs-qa, patch
Version: CURRENTFlags: gonzo: mfc-stable11+
gonzo: mfc-stable10-
Hardware: Any   
OS: Any   
Attachments:
Description Flags
file.diff
none
fpu_patch2.txt
none
fpu_patch3.txt
none
Fixed version none

Description Vladyslav Movchan 2012-03-02 13:50:09 UTC
Some miniport drivers (windows NIC drivers) could use FPU in kernel what cause "Unregistered use of FPU in kernel" panic.

I've seen this only in amd64 case; i386 seems to be not affected.
Same is mentioned in this two messages:
http://lists.freebsd.org/pipermail/svn-src-all/2010-March/021770.html
http://lists.freebsd.org/pipermail/svn-src-all/2010-March/021773.html

Point when panic occurs depends on the driver. In my case it happening during the first attempt to transmit a packet.


Panic message:
panic: Unregistered use of FPU in kernel
cpuid = 3
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
panic() at panic+0x1cd
trap() at trap+0x71f
calltrap() at calltrap+0x8
--- trap 0x16, rip = 0xffffffff83c37e0e, rsp = 0xffffff80d116d7c0, rbp = 0xffffff80075df520 ---
__stop_set_sysinit_set() at __stop_set_sysinit_set+0xe6ee
dmapbase() at 0xfffffe0040fb5180
(null)() at 0xffffff80075df520
dmapbase() at 0xfffffe0040f83d00
dmapbase() at 0xfffffe0040f83e00
dmapbase() at 0xfffffe0040f8b080
dmapbase() at 0xfffffe0040f8b180
dmapbase() at 0xfffffe0040f8b280
..
dmapbase() at 0xfffffe0040fb4600
dmapbase() at 0xfffffe0040fb4700
dmapbase() at 0xfffffe0040fb4800
dmapbase() at 0xfffffe0040fb4900
dmapbase() at 0xfffffe0040fb4a00

Backtrace:
#0  doadump (textdump=-787033584) at /usr/src/sys/kern/kern_shutdown.c:268
268             if (textdump && textdump_pending) {
(kgdb) #0  doadump (textdump=-787033584) at /usr/src/sys/kern/kern_shutdown.c:268
#1  0xffffffff802de36c in db_fncall (dummy1=Variable "dummy1" is not available.
)
    at /usr/src/sys/ddb/db_command.c:573
#2  0xffffffff802de6a1 in db_command (last_cmdp=0xffffffff80dcd560, cmd_table=Variable "cmd_table" is not available.

) at /usr/src/sys/ddb/db_command.c:449
#3  0xffffffff802de8f0 in db_command_loop ()
    at /usr/src/sys/ddb/db_command.c:502
#4  0xffffffff802e0ad4 in db_trap (type=Variable "type" is not available.
) at /usr/src/sys/ddb/db_main.c:229
#5  0xffffffff806d3bb1 in kdb_trap (type=3, code=0, tf=0xffffff80d116d440)
    at /usr/src/sys/kern/subr_kdb.c:629
#6  0xffffffff80942f68 in trap (frame=0xffffff80d116d440)
    at /usr/src/sys/amd64/amd64/trap.c:591
#7  0xffffffff8092c1cf in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:228
#8  0xffffffff806d36ab in kdb_enter (why=0xffffffff80a3fb16 "panic",
    msg=0x80 <Address 0x80 out of bounds>) at cpufunc.h:63
#9  0xffffffff8069aea6 in panic (fmt=Variable "fmt" is not available.
)
    at /usr/src/sys/kern/kern_shutdown.c:633
#10 0xffffffff809434ef in trap (frame=Variable "frame" is not available.
) at /usr/src/sys/amd64/amd64/trap.c:478
#11 0xffffffff8092c1cf in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:228
#12 0xffffffff83c37e0e in ndis_Rtenic64_sys_drv_data_start ()
   from /boot/modules/Rtenic64_sys.ko
#13 0xffffffffffffffff in ?? ()
#14 0xffffffff80a40a28 in __func__.17200 ()
#15 0x0000074200000001 in ?? ()
#16 0xffffffff80a42b69 in link_elf_methods ()
#17 0xffffff80d116d800 in ?? ()
#18 0x0000000000000000 in ?? ()
#19 0x0000000000000000 in ?? ()
#20 0x0000000000000000 in ?? ()
..
#79 0x0000000000000000 in ?? ()
#80 0x0000000000000000 in ?? ()
#81 0x0000000000000000 in ?? ()
#82 0x0000000000000000 in ?? ()
#83 0xffffff80d116dad0 in ?? ()
#84 0xfffffe007b2b1348 in ?? ()
#85 0xffffff80075de000 in ?? ()
#86 0xfffffe0008410800 in ?? ()
#87 0xffffff80075deff0 in ?? ()
#88 0xffffff80075de000 in ?? ()
#89 0xffffff80075de000 in ?? ()
#90 0xffffff80d116dad0 in ?? ()
#91 0x0000000000000000 in ?? ()
#92 0xffffffff83c3d3ce in ndis_Rtenic64_sys_drv_data_start ()
   from /boot/modules/Rtenic64_sys.ko
#93 0xffffff80d1160000 in ?? ()
#94 0xfffffe0000000000 in ?? ()
#95 0xffffff8000000000 in ?? ()
#96 0xffffff80d116d900 in ?? ()
#97 0xffffffff83c3d284 in ndis_Rtenic64_sys_drv_data_start ()
   from /boot/modules/Rtenic64_sys.ko
#98 0xffffffff840e07b9 in x86_64_call1 ()
    at /usr/src/sys/modules/ndis/../../compat/ndis/winx64_wrap.S:130
#99 0xfffffe007b2b1328 in ?? ()
#100 0xfffffe007b154000 in ?? ()
#101 0xffffff80075deff8 in ?? ()
#102 0xfffffe007b2b1328 in ?? ()
#103 0xffffff80d116dad0 in ?? ()
#104 0xffffffff840d3e87 in ndis_intrhand (dpc=Variable "dpc" is not available.
)
at /usr/src/sys/modules/ndis/../../compat/ndis/subr_ndis.c:2234
Previous frame inner to this frame (corrupt stack?)
(kgdb)


Core/process which triggers this panic:
cpuid        = 3
dynamic pcpu = 0xffffff807f453100
curthread    = 0xfffffe0008bbd000: pid 1547 "Windows DPC 0"
curpcb       = 0xffffff80d116dd00
fpcurthread  = none
idlethread   = 0xfffffe00046b0480: tid 100006 "idle: cpu3"
curpmap      = 0xffffffff80df4c50
tssp         = 0xffffffff82609718
commontssp   = 0xffffffff82609718
rsp0         = 0xffffff80d116dd00
gs32p        = 0xffffffff82607870
ldt          = 0xffffffff826078b0
tss          = 0xffffffff826078a0
spin locks held:

Fix: Attached patch fixed this problem for me. Original version of this patch was written by Paul B Mahol (many thanks to Paul!)

Patch attached with submission follows:
How-To-Repeat: Use ndis on amd64:
kldload module created with ndisgen; assign ip address; try to ping something.

I believe not every miniport driver is affected, but drivers of both devices I've tested could trigger this panic.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2012-03-12 07:26:21 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-net

Over to maintainer(s).
Comment 2 Vladyslav Movchan 2012-03-12 15:20:53 UTC
I've reimplemented original patch to maintain cache of fpu_kern_ctx
elements to reduce amount of allocations/deallocations done via
fpu_kern_alloc_ctx/fpu_kern_free_ctx.
It is complex to measure performance gain of this change due to
deadlock in ndis code (what makes it complex or impossible to
stress-test), but when the same change was tested with
https://github.com/NDISulator/ it allowed to get about 10% higher
bandwidth with 1Gbps NIC (which was CPU bound).
Comment 3 A. Wilcox 2015-01-30 23:07:42 UTC
This patch still applies cleanly to 11.0-CURRENT and is still needed for at least the bcm43xx 5.100.235.19 NDIS 5 driver.
Comment 4 John Baldwin freebsd_committer freebsd_triage 2015-02-27 17:23:21 UTC
Comment on attachment 122422 [details]
fpu_patch3.txt

Did you consider removing the O(n) lookup by keeping separate "busy" and "free" lists?  You could then also assert that that the busy list was empty in the windrv_libfini() routine.
Comment 5 Vladyslav Movchan 2015-03-04 09:19:44 UTC
Created attachment 153770 [details]
Fixed version
Comment 6 Vladyslav Movchan 2015-03-04 09:21:24 UTC
(In reply to John Baldwin from comment #4)
Thanks for the advice.
I've reimplemented the patch. Now it maintains separate "free" and "busy" lists what changes lookup for unused element from O(n) to O(1).
Also I switched from singly-linked list to a doubly linked lists to make it possible to do an arbitrary element removal (necessary for "busy" list) in O(1).
And also I've added assertion to windrv_libfini() to make sure busy list is empty.
Comment 7 Alexey Dokuchaev freebsd_committer freebsd_triage 2016-11-23 12:59:27 UTC
With NDIS'ified WinXP Broadcom BCM943228HMB WiFi driver, I observed similar panic on fresh FreeBSD/amd64 12.0-CURRENT r307735:308608M.  Attached patch had fixed it.
Comment 8 Eitan Adler freebsd_committer freebsd_triage 2018-05-23 10:27:53 UTC
batch change of PRs untouched in 2018 marked "in progress" back to open.
Comment 9 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2019-01-19 05:48:48 UTC
(Adding kib@ to Cc since he was part of the original discussion.)

Looks like this patch is still relevant. I reviewed the patch and on it seems to be OK to me and since it was also tested by other users I'd like to commit it. But I am not NDIS/amd64 expert and would like second opinion. Kostik, could you take a look and let me know whether it can be committed as-is or some additional work required.

Thanks
Comment 10 Konstantin Belousov freebsd_committer freebsd_triage 2019-01-19 08:32:57 UTC
(In reply to Oleksandr Tymoshenko from comment #9)
I do not have a strong opinion there, if you want to commit this, go ahead.

I do find it strange the unused fpu context list, why not simply call fpu_kern_alloc() as needed ?  I suspect that the cost of allocation is lower than contending on the mutex.
Comment 11 Vladyslav Movchan 2019-01-19 17:25:48 UTC
Thanks for the feedback and attention to this bug.
I can't argue why using mutex-protected local cache of fpu_kern_ctx structures was faster than allocating/deallocating this structures when necessary, but in my tests mentioned in comment 2 it was so. Difference was clearly visible.
Unfortunately I can't re-run same tests as it seems I no longer own 1Gbps card usable with NDIS - almost 7 years passed since.
Comment 12 commit-hook freebsd_committer freebsd_triage 2019-01-22 03:53:53 UTC
A commit references this bug:

Author: gonzo
Date: Tue Jan 22 03:53:43 UTC 2019
New revision: 343298
URL: https://svnweb.freebsd.org/changeset/base/343298

Log:
  [ndis] Fix unregistered use of FPU by NDIS in kernel on amd64

  amd64 miniport drivers are allowed to use FPU which triggers "Unregistered use
  of FPU in kernel" panic.

  Wrap all variants of MSCALL with fpu_kern_enter/fpu_kern_leave.  To reduce
  amount of allocations/deallocations done via
  fpu_kern_alloc_ctx/fpu_kern_free_ctx maintain cache of fpu_kern_ctx elements.

  Based on the patch by Paul B Mahol

  PR:		165622
  Submitted by:	Vlad Movchan <vladislav.movchan@gmail.com>
  MFC after:	1 month

Changes:
  head/sys/compat/ndis/kern_windrv.c
  head/sys/compat/ndis/pe_var.h
Comment 13 commit-hook freebsd_committer freebsd_triage 2019-03-23 22:44:53 UTC
A commit references this bug:

Author: gonzo
Date: Sat Mar 23 22:44:12 UTC 2019
New revision: 345459
URL: https://svnweb.freebsd.org/changeset/base/345459

Log:
  MFC r343298:

  [ndis] Fix unregistered use of FPU by NDIS in kernel on amd64

  amd64 miniport drivers are allowed to use FPU which triggers "Unregistered use
  of FPU in kernel" panic.

  Wrap all variants of MSCALL with fpu_kern_enter/fpu_kern_leave.  To reduce
  amount of allocations/deallocations done via
  fpu_kern_alloc_ctx/fpu_kern_free_ctx maintain cache of fpu_kern_ctx elements.

  Based on the patch by Paul B Mahol

  PR:		165622
  Submitted by:	Vlad Movchan <vladislav.movchan@gmail.com>

Changes:
_U  stable/12/
  stable/12/sys/compat/ndis/kern_windrv.c
  stable/12/sys/compat/ndis/pe_var.h
Comment 14 commit-hook freebsd_committer freebsd_triage 2019-04-25 00:58:27 UTC
A commit references this bug:

Author: gonzo
Date: Thu Apr 25 00:58:12 UTC 2019
New revision: 346656
URL: https://svnweb.freebsd.org/changeset/base/346656

Log:
  MFC r343298:

  [ndis] Fix unregistered use of FPU by NDIS in kernel on amd64

  amd64 miniport drivers are allowed to use FPU which triggers "Unregistered use
  of FPU in kernel" panic.

  Wrap all variants of MSCALL with fpu_kern_enter/fpu_kern_leave.  To reduce
  amount of allocations/deallocations done via
  fpu_kern_alloc_ctx/fpu_kern_free_ctx maintain cache of fpu_kern_ctx elements.

  Based on the patch by Paul B Mahol

  PR:		165622
  Submitted by:	Vlad Movchan <vladislav.movchan@gmail.com>

Changes:
_U  stable/11/
  stable/11/sys/compat/ndis/kern_windrv.c
  stable/11/sys/compat/ndis/pe_var.h