Bug 143573 - [em] em(4) NIC crashes intermittently
Summary: [em] em(4) NIC crashes intermittently
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: jfv
URL:
Keywords: IntelNetworking
Depends on:
Blocks:
 
Reported: 2010-02-05 08:00 UTC by Earl R. Lapus
Modified: 2015-07-01 14:28 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Earl R. Lapus 2010-02-05 08:00:08 UTC
PCI-X dual-port Intel NICs crashes box (see crash dump below) - the box is running as a firewall. Crashes don't occur when using different NICs but with the same setup (pf enabled, IPv4/IPv6 enabled, gif0 enabled).

----------------------------------------
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address	= 0xffff804039eb8170
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff808c01b9
stack pointer	        = 0x28:0xffffff802c9fba40
frame pointer	        = 0x28:0xffffff00174c1100
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 0 (em0 taskq)
trap number		= 12
panic: page fault
cpuid = 1
Uptime: 49m0s
Physical memory: 2033 MB
Dumping 1359 MB:panic: bufwrite: buffer is not busy???
cpuid = 1
 1344 1328 1312 1296 1280 1264 1248 1232

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address	= 0xffffffffffff80ff
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffff0001a94ab0
stack pointer	        = 0x28:0xffffff80000d1b30
frame pointer	        = 0x28:0x0
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 12 (irq22: em3 atapci0)
trap number		= 12
panic: page fault
cpuid = 1
 1216 1200 1184 1168 1152 1136 1120 1104 1088 1072 1056 1040 1024 1008 992 976 960 944 928 912 896 880 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16

#0  doadump () at pcpu.h:223
223	pcpu.h: No such file or directory.
	in pcpu.h
(kgdb) where
#0  doadump () at pcpu.h:223
#1  0x0000000000000004 in ?? ()
#2  0xffffffff805bbc29 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:416
#3  0xffffffff805bc022 in panic (fmt=0x104 <Address 0x104 out of bounds>) at /usr/src/sys/kern/kern_shutdown.c:579
#4  0xffffffff808c8600 in trap_fatal (frame=0xffffff0001a94ab0, eva=Variable "eva" is not available.
) at /usr/src/sys/amd64/amd64/trap.c:852
#5  0xffffffff808c89d5 in trap_pfault (frame=0xffffff802c9fb990, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:768
#6  0xffffffff808c92cc in trap (frame=0xffffff802c9fb990) at /usr/src/sys/amd64/amd64/trap.c:494
#7  0xffffffff808af9b3 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:224
#8  0xffffffff808c01b9 in pmap_kextract (va=254734608629831) at /usr/src/sys/amd64/amd64/pmap.c:1048
#9  0xffffffff808ae5c8 in bus_dmamap_load_mbuf_sg (dmat=0xffffff0001ad0a00, map=0xffffffff80ca6700, m0=Variable "m0" is not available.
) at /usr/src/sys/amd64/amd64/busdma_machdep.c:653
#10 0xffffffff8036b582 in em_get_buf (adapter=0xffffff80003b8000, i=845) at /usr/src/sys/dev/e1000/if_em.c:4041
#11 0xffffffff8036ea2d in em_rxeof (adapter=0xffffff80003b8000, count=99) at /usr/src/sys/dev/e1000/if_em.c:4439
#12 0xffffffff80373f5e in em_handle_rxtx (context=Variable "context" is not available.
) at /usr/src/sys/dev/e1000/if_em.c:1660
#13 0xffffffff805f665b in taskqueue_run (queue=0xffffff0001c22580) at /usr/src/sys/kern/subr_taskqueue.c:239
#14 0xffffffff805f68c5 in taskqueue_thread_loop (arg=Variable "arg" is not available.
) at /usr/src/sys/kern/subr_taskqueue.c:360
#15 0xffffffff805939cd in fork_exit (callout=0xffffffff805f6880 <taskqueue_thread_loop>, arg=0xffffff80003bc7d0, frame=0xffffff802c9fbc80)
    at /usr/src/sys/kern/kern_fork.c:843
#16 0xffffffff808afe0e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:561
#17 0x0000000000000000 in ?? ()
#18 0x0000000000000000 in ?? ()
#19 0x0000000000000000 in ?? ()
#20 0x0000000000000000 in ?? ()
#21 0x0000000000000000 in ?? ()
#22 0x0000000000000000 in ?? ()
#23 0x0000000000000000 in ?? ()
#24 0x0000000000000000 in ?? ()
#25 0x0000000000000000 in ?? ()
#26 0x0000000000000000 in ?? ()
#27 0x0000000000000000 in ?? ()
#28 0x0000000000000000 in ?? ()
#29 0x0000000000000000 in ?? ()
#30 0x0000000000000000 in ?? ()
#31 0x0000000000000000 in ?? ()
#32 0x0000000000000000 in ?? ()
#33 0x0000000000000000 in ?? ()
#34 0x0000000000000000 in ?? ()
#35 0x0000000000000000 in ?? ()
#36 0x0000000000000000 in ?? ()
#37 0x0000000000000000 in ?? ()
#38 0x0000000000000000 in ?? ()
#39 0x0000000000000000 in ?? ()
#40 0x0000000000000000 in ?? ()
#41 0x0000000000ed9000 in ?? ()
#42 0xffffffff80c6c600 in tdq_cpu ()
#43 0x0000000000000001 in ?? ()
#44 0xffffff0001a94ab0 in ?? ()
#45 0xffffffff80c6b980 in affinity ()
#46 0xffffffff80c6b980 in affinity ()
#47 0xffffff802c9fb5f8 in ?? ()
#48 0xffffff0001a94ab0 in ?? ()
#49 0xffffffff805df020 in sched_switch (td=0xffffff80003bc7d0, newtd=Variable "newtd" is not available.
) at /usr/src/sys/kern/sched_ule.c:1858
#50 0x0000000000000000 in ?? ()
#51 0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---q
Quit
(kgdb) quit

How-To-Repeat: happens intermittently... no known steps to recreate.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2010-02-05 08:41:04 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-net

Over to maintainer(s).
Comment 2 Earl R. Lapus 2010-02-05 08:57:11 UTC
Additional note: the kernel's /usr/src/sys/dev/e1000 directory was
updated to CURRENT(as of Feb 2 2010) after the first crash. But it
still kept on crashing - on the same spot.
Comment 3 spry 2010-02-06 23:59:49 UTC
This looks like these:
http://forums.freebsd.org/archive/index.php/t-10475.html
http://lists.freebsd.org/pipermail/freebsd-stable/2010-January/054369.html

How about this?
http://www.mail-archive.com/commits@crater.dragonflybsd.org/msg03494.html


-- 
cheers
mars
-----
Joan Crawford  - "I, Joan Crawford, I believe in the dollar.
Everything I earn, I spend." -
http://www.brainyquote.com/quotes/authors/j/joan_crawford.html
Comment 4 Ivan Voras freebsd_committer 2010-02-07 00:41:08 UTC
On 7 February 2010 00:59, Mars G Miro <spry@anarchy.in.the.ph> wrote:
> This looks like these:
> http://forums.freebsd.org/archive/index.php/t-10475.html
> http://lists.freebsd.org/pipermail/freebsd-stable/2010-January/054369.html
>
> How about this?
> http://www.mail-archive.com/commits@crater.dragonflybsd.org/msg03494.html
>

I don't know - from the looks of the traces it seems like all of these
point to a different bug...
Comment 5 Earl R. Lapus 2010-02-12 09:26:56 UTC
It also crashes on the same spot using igb(4)

------------------------------------------------------------------------------------------------------
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
cessor eflags   = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq260: igb1)
trap number             = 12
panic: page fault
cpuid = 0
Uptime: 7m49s
Physical memory: 2031 MB
Dumping 1284 MB: (CTRL-C to abort)  1269 1253 1237 1221 1205 1189 1173
1157 1141 1125 1109 1093 1077 1061 1045 1029 1013 997 981 965 949 933
917 901 885 869 85
3 837 821 805 789 773 757 741 725 709 693 677 661 645 629 613 597 581
565 549 533 517 501 485 469 453 437 421 405 389 373 357 341 325 309
293 277 261 245 229 2
13 197 181 165 149 133 117 101 85 69 53 37 21 5

Reading symbols from /boot/kernel/if_igb.ko...Reading symbols from
/boot/kernel/if_igb.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/if_igb.ko
#0  doadump () at pcpu.h:223
223     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) where
#0  doadump () at pcpu.h:223
#1  0x0000000000000004 in ?? ()
#2  0xffffffff805ba099 in boot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:416
#3  0xffffffff805ba492 in panic (fmt=0x104 <Address 0x104 out of
bounds>) at /usr/src/sys/kern/kern_shutdown.c:579
#4  0xffffffff808c6a70 in trap_fatal (frame=0xffffff000142fab0,
eva=Variable "eva" is not available.
) at /usr/src/sys/amd64/amd64/trap.c:852
#5  0xffffffff808c6e45 in trap_pfault (frame=0xffffff80000be910,
usermode=0) at /usr/src/sys/amd64/amd64/trap.c:768
#6  0xffffffff808c773c in trap (frame=0xffffff80000be910) at
/usr/src/sys/amd64/amd64/trap.c:494
#7  0xffffffff808ade23 in calltrap () at
/usr/src/sys/amd64/amd64/exception.S:224
#8  0xffffffff808be629 in pmap_kextract (va=18446465905857658880) at
/usr/src/sys/amd64/amd64/pmap.c:1048
#9  0xffffffff808aca38 in bus_dmamap_load_mbuf_sg
(dmat=0xffffff00014a9780, map=0xffffffff80ca4880, m0=Variable "m0" is
not available.
) at /usr/src/sys/amd64/amd64/busdma_machdep.c:653
#10 0xffffffff80eb44bc in igb_get_buf (rxr=0xffffff0001488800,
i=Variable "i" is not available.
) at /usr/src/sys/modules/igb/../../dev/e1000/if_igb.c:3467
#11 0xffffffff80eb4bd5 in igb_rxeof (rxr=0xffffff0001488800, count=99)
at /usr/src/sys/modules/igb/../../dev/e1000/if_igb.c:4134
#12 0xffffffff80eb4f75 in igb_msix_rx (arg=Variable "arg" is not available.
) at /usr/src/sys/modules/igb/../../dev/e1000/if_igb.c:1455
#13 0xffffffff80593e90 in intr_event_execute_handlers (p=Variable "p"
is not available.
) at /usr/src/sys/kern/kern_intr.c:1165
#14 0xffffffff805953c7 in ithread_loop (arg=0xffffff0001486b20) at
/usr/src/sys/kern/kern_intr.c:1178
#15 0xffffffff80591e3d in fork_exit (callout=0xffffffff80595340
<ithread_loop>, arg=0xffffff0001486b20, frame=0xffffff80000bec80)
    at /usr/src/sys/kern/kern_fork.c:843
#16 0xffffffff808ae27e in fork_trampoline () at
/usr/src/sys/amd64/amd64/exception.S:561
Comment 6 Andre Oppermann freebsd_committer 2010-08-23 15:36:45 UTC
Responsible Changed
From-To: freebsd-net->jfv

Over to maintainer.
Comment 7 Sean Bruno freebsd_committer 2015-07-01 14:28:10 UTC
the em(4) function referenced in the stack trace "em_handle_rxtx" was retired and its functionality has been broken up across multiple function handlers in stable/10 and head.

If this is still a problem on stable/10, reopen this ticket and we'll take a deeper look.

lem(4) still has a lem_handle_rxtx() which makes me think this issue might still be applicable there.