208389 – Netmap Panic

Bug 208389 - Netmap Panic

Summary: Netmap Panic

Status:	Closed Overcome By Events

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	CURRENT
Hardware:	Any Any

Importance:	--- Affects Only Me
Assignee:	freebsd-net (Nobody)

URL:
Keywords:

Depends on:
Blocks:

Reported:	2016-03-30 03:46 UTC by Shawn Webb
Modified:	2019-02-05 10:46 UTC (History)
CC List:	6 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Shawn Webb 2016-03-30 03:46:14 UTC

When you run `tcpdump -ni netmap:em0` and em0 is not in Netmap mode and you exit tcpdump (by hitting ^C), FreeBSD will panic.

Picture of panicking box here: https://goo.gl/photos/1fdTaMBFdit6ZkrP8

For some reason, doing a dump at the ddb prompt didn't produce a core.txt. Here's info.txt:

Dump header from device: /dev/ada0s1b
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 710184960
  Blocksize: 512
  Dumptime: Tue Mar 29 15:37:43 2016
  Hostname: [sanitized]
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 11.0-CURRENT-HBSD #73 [DEVEL:HardenedBSD-CURRENT-amd64:427] d31c1ca(HEAD): Sun Mar 20 09:25:50 EDT 2016
    jenkins@nyi-01.build.hardenedbsd.org:/usr/obj/jenkins/workspace/HardenedBSD  Panic String:
  Dump Parity: 3422855522
  Bounds: 0
  Dump Status: good

Comment 1 Shawn Webb 2016-03-30 05:23:32 UTC

If there's a commit that happens due to this bug report, please set the Reported By line to G2, Inc.

Comment 2 Shawn Webb 2016-03-30 08:14:11 UTC

I've confirmed that the bug exists on FreeBSD 10-STABLE, but not on FreeBSD 10.3-RELEASE.

Comment 3 Sean Bruno freebsd_committer

2016-03-31 16:04:59 UTC

Shawn:

Do you know which "em" driver you're using here?  Is it an lem(4) or em(4) device?

Probably check with pciconf -lv

Comment 4 Shawn Webb 2016-04-01 19:41:15 UTC

On one box, it's em0, on another, it's ue0. Same backtrace.

Comment 5 Jim Thompson 2016-04-01 20:10:10 UTC

Works fine on recent -CURRENT (r297237M), (Thinkpad x230, em0).

Comment 6 Shawn Webb 2016-04-01 20:16:04 UTC

The ue device I tested was a Belkin USB 2.0 Ethernet Adapter F4U047.

Comment 7 Sean Bruno freebsd_committer

2016-04-01 20:22:07 UTC

(In reply to Shawn Webb from comment #4)
When you get a chance, can you pciconf -lv on the test host?

I want to see what h/w you have for your em(4) device.

Comment 8 Shawn Webb 2016-04-01 20:24:38 UTC

em0@pci0:0:25:0:	class=0x020000 card=0x02761028 chip=0x10de8086 rev=0x02 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82567LM-3 Gigabit Network Connection'
    class      = network
    subclass   = ethernet

Comment 9 Sean Bruno freebsd_committer

2016-04-01 20:32:57 UTC

(In reply to Shawn Webb from comment #0)

Hmmm ... any other setup for this panic?  Doesn't seem to happen for me, but by default I can't run netmap on em0.

 # tcpdump -ni netmap:em0
635.856748 [ 609] netmap_obj_malloc         netmap_ring request size 65792 too large
635.864913 [1464] netmap_mem2_rings_create  Cannot allocate RX_ring
635.871790 nm_open [608] NIOCREGIF failed: Cannot allocate memory em0
tcpdump: netmap open: cannot access netmap:em0: Cannot allocate memory

Comment 10 Shawn Webb 2016-04-01 20:34:40 UTC

Reinstalling now with vanilla FreeBSD 11.0-CURRENT. Will try again soon.

Comment 11 Shawn Webb 2016-04-01 20:40:25 UTC

Looks like it won't panic on netmap:em0, but will panic on netmap:ue0. I'm using USB ethernet devices since that's what I have for this dev box (a desktop with only one physical onboard NIC, but two USB NICs).

Comment 12 Shirkdog 2016-04-01 21:23:16 UTC

I have observed a similar issue, on a build of HBSD 11 

11.0-CURRENT-HBSD FreeBSD 11.0-CURRENT-HBSD #0 352417c(hardened/current/master): Mon Mar 14 13:04:31 UTC 2016 

Intel PCIe card (dual card)
[1] em1: <Intel(R) PRO/1000 Network Connection 7.6.1-k> port 0xe000-0xe01f mem 0xf7d40000-0xf7d5ffff,0xf7d20000-0xf7d3ffff irq 17 at device 0.1 on pci1
[1] em1: Using an MSI interrupt
[1] em1: Ethernet address: 68:05:ca:XX:XX:XX
[1] em1: netmap queues/slots: TX 1/1024, RX 1/1024  

em1@pci0:1:0:1: class=0x020000 card=0x115e8086 chip=0x105e8086 rev=0x06 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82571EB Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet

tcpdump prints the following (when other traffic should exist, including the SSH session I am using

tcpdump -i netmap:em1 -nns 0 -Xxvvvvetttt
tcpdump: listening on netmap:em1, link-type EN10MB (Ethernet), capture size 262144 bytes

2016-04-01 17:00:07.595078 00:00:00:00:00:00 > 00:00:00:00:00:00, 802.3, length 177: LLC, dsap Null (0x00) Individual, ssap Null (0x00) Command, ctrl 0x0000: Information, send seq 0, rcv seq 0, Flags [Command], length 163                                                                                                                                                                       
        0x0000:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0020:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0030:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0040:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0050:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0060:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0070:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0080:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0090:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x00a0:  0000 00                                  ...             
                                                                

Steps to reproduce:

ifconfig em1 up
tcpdump -i netmap:em1 -nns 0


Output from panic/dump

Unread portion of the kernel message buffer:
[267] panic: Memory modified after free 0xfffff800c4468000(2048) val=ffffffff @ 0xfffff800c4468000
[267] 
[267] cpuid = 0
[267] KDB: stack backtrace:
[267] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe02337f2620
[267] vpanic() at vpanic+0x182/frame 0xfffffe02337f26a0
[267] panic() at panic+0x43/frame 0xfffffe02337f2700
[267] trash_ctor() at trash_ctor+0x48/frame 0xfffffe02337f2710
[267] mb_ctor_pack() at mb_ctor_pack+0x2a/frame 0xfffffe02337f2750
[267] uma_zalloc_arg() at uma_zalloc_arg+0x4e0/frame 0xfffffe02337f27b0
[267] m_getjcl() at m_getjcl+0x39/frame 0xfffffe02337f27f0
[267] em_init_locked() at em_init_locked+0xd62/frame 0xfffffe02337f28c0
[267] em_netmap_reg() at em_netmap_reg+0x1c8/frame 0xfffffe02337f2910
[267] netmap_do_unregif() at netmap_do_unregif+0x130/frame 0xfffffe02337f2940
[267] netmap_dtor() at netmap_dtor+0x64/frame 0xfffffe02337f2960
[267] devfs_destroy_cdevpriv() at devfs_destroy_cdevpriv+0x8b/frame 0xfffffe02337f2980
[267] devfs_close_f() at devfs_close_f+0x65/frame 0xfffffe02337f29b0
[267] _fdrop() at _fdrop+0x1a/frame 0xfffffe02337f29d0
[267] closef() at closef+0x1e1/frame 0xfffffe02337f2a60
[267] closefp() at closefp+0x9f/frame 0xfffffe02337f2aa0
[267] amd64_syscall() at amd64_syscall+0x2c1/frame 0xfffffe02337f2bb0
[267] Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe02337f2bb0
[267] --- syscall (6, FreeBSD ELF64, sys_close), rip = 0xf590083b5a, rsp = 0x6b3d21120d08, rbp = 0x6b3d21120d70 ---
[267] KDB: enter: panic

Reading symbols from /boot/kernel/zfs.ko...done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/fdescfs.ko...done.
Loaded symbols for /boot/kernel/fdescfs.ko
Reading symbols from /boot/kernel/uhid.ko...done.
Loaded symbols for /boot/kernel/uhid.ko
Reading symbols from /boot/kernel/ipfw.ko...done.
Loaded symbols for /boot/kernel/ipfw.ko
#0  doadump (textdump=0) at pcpu.h:221
221		__asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb)

Comment 13 Shirkdog 2016-04-01 21:31:11 UTC

I will also grab a vanilla FreeBSD 11-Current and test this on the same hardware.

Comment 14 Shirkdog 2016-04-02 02:19:30 UTC

Test with FreeBSD 11

11.0-CURRENT FreeBSD 11.0-CURRENT #0 r294499: Thu Jan 21 15:46:19 UTC 2016     root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

It seems to take longer on Vanilla FreeBSD than HardenedBSD.

If I start and stop tcpdump a few times it still panics in the same way for me.

ifconfig em1 up
(Run the following several times (6 times it panics every time)):
tcpdump -i netmap:em1 -nns 0
<Ctrl>+C

Also, I know this has been bugged in Bug 193075, but if you use a host interface with the vale switch you also get a kernel panic

tcpdump -i vale0:em0
tcpdump -i vale0:em1

Comment 15 Jim Thompson 2016-04-03 21:11:01 UTC

I just ran 20 iterations of "netmap -ni netmap:em0" on the same machine / kernel versions as above.

No crashes.

"tcpdump -i vale0:em0" does crash the system, but, as reported, this bug is already known.

Comment 16 Shirkdog 2016-04-04 00:30:41 UTC

The following will reproduce the issue on FreeBSD 11 Current with the following dual nic intel card:

em1@pci0:1:0:1: class=0x020000 card=0x115e8086 chip=0x105e8086 rev=0x06 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82571EB Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet

1) Bootup the system without any network cable plugged, then plugin to em1 and login as root
2) run "ifconfig em1 up"
3) run "tcpdump -i em1 -nns 0" (ensures there is traffic accessible to this interface. My traffic mix was IGMP and Broadcast packets with some IPv6 for DHCP solicitation)
4) run "tcpdump -i netmap:em1 -ns 0" When I run this, I get no traffic, even though I should be seeing the same IGMP and Broadcast packets that I know the interface has access too. I also get the following lock order reversal:

Apr  4 00:17:43 test login: ROOT LOGIN (root) ON ttyv0
Apr  4 00:17:49 test kernel: em1: link state changed to UP
Apr  4 00:17:51 test kernel: em1: promiscuous mode enabled
Apr  4 00:17:56 test kernel: em1: promiscuous mode disabled
Apr  4 00:18:02 test kernel: em1: link state changed to DOWN
Apr  4 00:18:02 test kernel: lock order reversal: (sleepable after non-sleepable)
Apr  4 00:18:02 test kernel: 1st 0xfffff8000cc4f800 vm object (vm object) @ /usr/src/sys/vm/vm_fault.c:360
Apr  4 00:18:02 test kernel: 2nd 0xffffffff817f2e58 (&nm_mem)->nm_mtx ((&nm_mem)->nm_mtx) @ /usr/src/sys/dev/netmap/netmap_mem2.c:490
Apr  4 00:18:02 test kernel: stack backtrace:
Apr  4 00:18:02 test kernel: #0 0xffffffff80a7b800 at witness_debugger+0x70
Apr  4 00:18:02 test kernel: #1 0xffffffff80a7b701 at witness_checkorder+0xe71
Apr  4 00:18:02 test kernel: #2 0xffffffff80a28ce2 at _sx_xlock+0x72
Apr  4 00:18:02 test kernel: #3 0xffffffff80698a9d at netmap_mem2_ofstophys+0x2d
Apr  4 00:18:02 test kernel: #4 0xffffffff806960fb at netmap_dev_pager_fault+0x3b
Apr  4 00:18:02 test kernel: #5 0xffffffff80ccf6c1 at dev_pager_getpages+0x61
Apr  4 00:18:02 test kernel: #6 0xffffffff80cf7e0a at vm_pager_get_pages+0x4a 
Apr  4 00:18:02 test kernel: #7 0xffffffff80cdbf00 at vm_fault_hold+0x760
Apr  4 00:18:02 test kernel: #8 0xffffffff80cdb758 at vm_fault+0x78
Apr  4 00:18:02 test kernel: #9 0xffffffff80e6cac5 at trap_pfault+0x115
Apr  4 00:18:02 test kernel: #10 0xffffffff80e6c37d at trap+0x57d
Apr  4 00:18:02 test kernel: #11 0xffffffff80e4c427 at calltrap+0x8

5) Wait until you see about 5 of these Null packets output from tcpdump:

2016-04-01 17:00:07.595078 00:00:00:00:00:00 > 00:00:00:00:00:00, 802.3, length 177: LLC, dsap Null (0x00) Individual, ssap Null (0x00) Command, ctrl 0x0000: Information, send seq 0, rcv seq 0, Flags [Command], length 163 

6) Hit control+C, and you should get back to your prompt, run tcpdump -i netmap:em1 -ns 0 again, you should see more of the null packet output from tcpdump, then hitting control+C will lead to the panic.

Of note, I have the following onboard NIC on this motherboard:

Apr  4 00:16:45 test kernel: re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xd000-0xd0ff mem 0xf7c00000-0xf7c00fff,0xf0000000-0xf0003fff irq 18 at device 0.0 on pci3
Apr  4 00:16:45 test kernel: re0: Using 1 MSI-X message
Apr  4 00:16:45 test kernel: re0: turning off MSI enable bit.
Apr  4 00:16:45 test kernel: re0: Chip rev. 0x4c000000
Apr  4 00:16:45 test kernel: re0: MAC rev. 0x00000000

I ran through the same steps, and I could not get the box to panic. After testing with re0, I removed the cable and put it into em1, and ran through the test.

I got different results from when I started from a reboot and tested em1. When running tcpdump -i netmap:em1 after using the Realtek NIC, the first IP packet would claim that it was truncated, and it took about 4 times using the test procedure before it paniced.

Comment 17 Shawn Webb 2016-04-07 17:01:03 UTC

Got the same panic with bce1. Here's the pciconf -lv info for it:

bce1@pci0:132:0:1:      class=0x020000 card=0x191714e4 chip=0x163914e4 rev=0x20 hdr=0x00
    vendor     = 'Broadcom Corporation'
    device     = 'NetXtreme II BCM5709 Gigabit Ethernet'
    class      = network
    subclass   = ethernet

Comment 18 Vincenzo Maffione freebsd_committer

2019-01-11 09:42:25 UTC

I think this was due to a difference in struct mbuf between FreeBSD 10 and 11+, when netmap is used in emulated.
And it was fixed some time ago.
Is anyone still experiencing this on 11/12?

Comment 19 Vincenzo Maffione freebsd_committer

2019-02-05 10:46:50 UTC

Closing, as this was fixed a year ago, and it is not reproducible anymore.