Bug 165630 - [ndis][panic][patch] IRQL_NOT_GREATER_THAN
Summary: [ndis][panic][patch] IRQL_NOT_GREATER_THAN
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.0-CURRENT
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-03-02 15:30 UTC by Vlad Movchan
Modified: 2015-07-16 09:10 UTC (History)
1 user (show)

See Also:
bugmeister: mfc-stable8?
bugmeister: mfc-stable9?
bugmeister: mfc-stable10?


Attachments
file.diff (648 bytes, patch)
2012-03-02 15:30 UTC, Vlad Movchan
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Vlad Movchan 2012-03-02 15:30:06 UTC
There is a race condition inside of sys/compat/ndis/subr_hal.c:KfRaiseIrql() function which could lead to "IRQL_NOT_GREATER_THAN" panic.

Race could happen if after execution of this line:
oldirql = KeGetCurrentIrql();

kthread got preempted and later appeared on another CPU core which has another "irq level".
In this case "oldirql != DISPATCH_LEVEL" check will have no much sense as on current CPU core "irq level" could be different. As result we will skip sched_pin + mtx_lock lines when it was necessary execute them, or execute then when it necessary to skip.


To fix this race we pin to current CPU core before obtaining "irq level", then we made "oldirql != DISPATCH_LEVEL" check and later we "unpin" (if we find that previous sched_pin() was not actually necessary).


Panic message:
panic: IRQL_NOT_GREATER_THAN
cpuid = 3
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
panic() at panic+0x1cd
KfLowerIrql() at KfLowerIrql+0x62
ntoskrnl_workitem_thread() at ntoskrnl_workitem_thread+0xab
fork_exit() at fork_exit+0x189
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff80d111acf0, rbp = 0 ---
KDB: enter: panic


Backtrace:
#0  doadump (textdump=-787372224) at /usr/src/sys/kern/kern_shutdown.c:268
268             if (textdump && textdump_pending) {
(kgdb) #0  doadump (textdump=-787372224) at /usr/src/sys/kern/kern_shutdown.c:268
#1  0xffffffff802de36c in db_fncall (dummy1=Variable "dummy1" is not available.
)
    at /usr/src/sys/ddb/db_command.c:573
#2  0xffffffff802de6a1 in db_command (last_cmdp=0xffffffff80dcd560, cmd_table=Variable "cmd_table" is not available.

) at /usr/src/sys/ddb/db_command.c:449
#3  0xffffffff802de8f0 in db_command_loop ()
    at /usr/src/sys/ddb/db_command.c:502
#4  0xffffffff802e0ad4 in db_trap (type=Variable "type" is not available.
) at /usr/src/sys/ddb/db_main.c:229
#5  0xffffffff806d3bb1 in kdb_trap (type=3, code=0, tf=0xffffff80d111a970)
    at /usr/src/sys/kern/subr_kdb.c:629
#6  0xffffffff80942f68 in trap (frame=0xffffff80d111a970)
    at /usr/src/sys/amd64/amd64/trap.c:591
#7  0xffffffff8092c1cf in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:228
#8  0xffffffff806d36ab in kdb_enter (why=0xffffffff80a3fb16 "panic",
    msg=0x80 <Address 0x80 out of bounds>) at cpufunc.h:63
#9  0xffffffff8069aea6 in panic (fmt=Variable "fmt" is not available.
)
    at /usr/src/sys/kern/kern_shutdown.c:633
#10 0xffffffff840d6ff2 in KfLowerIrql (oldirql=Variable "oldirql" is not available.
)
    at /usr/src/sys/modules/ndis/../../compat/ndis/subr_hal.c:417
#11 0xffffffff840dba5b in ntoskrnl_workitem_thread (arg=Variable "arg" is not available.
)
    at /usr/src/sys/modules/ndis/../../compat/ndis/subr_ntoskrnl.c:2766
#12 0xffffffff8066b039 in fork_exit (
    callout=0xffffffff840db9b0 <ntoskrnl_workitem_thread>,
    arg=0xfffffe00206a9cc0, frame=0xffffff80d111ac40)
    at /usr/src/sys/kern/kern_fork.c:992
#13 0xffffffff8092c6fe in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:602
#14 0x0000000000000000 in ?? ()
#15 0x0000000000000000 in ?? ()
#16 0x0000000000000001 in ?? ()
#17 0x0000000000000000 in ?? ()
#18 0x0000000000000000 in ?? ()
#19 0x0000000000000000 in ?? ()
#20 0x0000000000000000 in ?? ()
#21 0x0000000000000000 in ?? ()
#22 0x0000000000000000 in ?? ()
#23 0x0000000000000000 in ?? ()
#24 0x0000000000000000 in ?? ()
#25 0x0000000000000000 in ?? ()
#26 0x0000000000000000 in ?? ()
#27 0x0000000000000000 in ?? ()
#28 0x0000000000000000 in ?? ()
#29 0x0000000000000000 in ?? ()
#30 0x0000000000000000 in ?? ()
#31 0x0000000000000000 in ?? ()
#32 0x0000000000000000 in ?? ()
#33 0x0000000000000000 in ?? ()
#34 0x0000000000000000 in ?? ()
#35 0x0000000000000000 in ?? ()
#36 0x0000000000000000 in ?? ()
#37 0x0000000000000000 in ?? ()
#38 0xffffffff8240f200 in tdq_cpu ()
#39 0x0000000000000000 in ?? ()
#40 0xffffffff8240f1c0 in affinity ()
#41 0xfffffe0020971900 in ?? ()
#42 0xffffff80d111a980 in ?? ()
#43 0xffffff80d111a928 in ?? ()
#44 0xfffffe00082bf900 in ?? ()
#45 0xffffffff806c64ac in sched_switch (td=0xfffffe00206a9cc0,
    newtd=0xffffffff840db9b0, flags=Variable "flags" is not available.
) at /usr/src/sys/kern/sched_ule.c:1890
Previous frame inner to this frame (corrupt stack?)
(kgdb)

Fix: Attached patch fixed this problem for me.

Patch attached with submission follows:
How-To-Repeat: There are no special conditions necessary to reproduce this panic - just load ndis interface with traffic and wait for the race.
I've used benchmarks/iperf running 10 parallel threads ("-P 10" flag) with tcp window set to 1Mb ("-w 1m" flag). Machine with ndis interface was a "client" of iperf (so it was mostly sending). 

Adding bpf(4) consumers (trafshow or tcpdump) allows to trigger this race/panic faster.
Using faster NIC (1Gbps NIC instead of wifi) allows to trigger this race/panic faster.
Comment 1 Bruce Cran freebsd_committer 2012-03-03 17:36:46 UTC
Responsible Changed
From-To: freebsd-bugs->brucec

Take.
Comment 2 dfilter service freebsd_committer 2012-03-04 17:09:26 UTC
Author: brucec
Date: Sun Mar  4 17:08:43 2012
New Revision: 232509
URL: http://svn.freebsd.org/changeset/base/232509

Log:
  Fix race condition in KfRaiseIrql().
  
  After getting the current irql, if the kthread gets preempted and
  subsequently runs on a different CPU, the saved irql could be wrong.
  
  Also, correct the panic string.
  
  PR:		kern/165630
  Submitted by:	Vladislav Movchan <vladislav.movchan at gmail.com>

Modified:
  head/sys/compat/ndis/subr_hal.c

Modified: head/sys/compat/ndis/subr_hal.c
==============================================================================
--- head/sys/compat/ndis/subr_hal.c	Sun Mar  4 17:00:46 2012	(r232508)
+++ head/sys/compat/ndis/subr_hal.c	Sun Mar  4 17:08:43 2012	(r232509)
@@ -392,16 +392,18 @@ KfRaiseIrql(uint8_t irql)
 {
 	uint8_t			oldirql;
 
+	sched_pin();
 	oldirql = KeGetCurrentIrql();
 
 	/* I am so going to hell for this. */
 	if (oldirql > irql)
-		panic("IRQL_NOT_LESS_THAN");
+		panic("IRQL_NOT_LESS_THAN_OR_EQUAL");
 
-	if (oldirql != DISPATCH_LEVEL) {
-		sched_pin();
+	if (oldirql != DISPATCH_LEVEL) 
 		mtx_lock(&disp_lock[curthread->td_oncpu]);
-	}
+	else
+		sched_unpin();	
+
 /*printf("RAISE IRQL: %d %d\n", irql, oldirql);*/
 
 	return (oldirql);
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 3 Bruce Cran freebsd_committer 2012-03-04 17:10:42 UTC
State Changed
From-To: open->patched

Fixed in r232509.
Comment 4 Glen Barber freebsd_committer 2012-09-13 03:06:23 UTC
Bruce,

Any chance you can MFC r232509, changed for this PR?

http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/165630

Glen
Comment 5 Mark Linimon freebsd_committer freebsd_triage 2013-07-03 01:50:32 UTC
Responsible Changed
From-To: brucec->freebsd-bugs

commit bit has been taken in for safekeeping.
Comment 6 commit-hook freebsd_committer 2015-07-16 09:08:54 UTC
A commit references this bug:

Author: brueffer
Date: Thu Jul 16 09:08:37 UTC 2015
New revision: 285624
URL: https://svnweb.freebsd.org/changeset/base/285624

Log:
  MFC: r232509 by brucec

  Fix race condition in KfRaiseIrql().

  After getting the current irql, if the kthread gets preempted and
  subsequently runs on a different CPU, the saved irql could be wrong.

  Also, correct the panic string.

  PR:		kern/165630
  Submitted by:	Vladislav Movchan <vladislav.movchan at gmail.com>

Changes:
_U  stable/9/sys/
  stable/9/sys/compat/ndis/subr_hal.c
Comment 7 Christian Brueffer freebsd_committer 2015-07-16 09:10:28 UTC
Merged to stable/9; thanks!