Bug 217156 - Kernel panic using Netmap with selected NIC queue
Summary: Kernel panic using Netmap with selected NIC queue
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Many People
Assignee: Navdeep Parhar
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-02-16 22:48 UTC by Miłosz Kaniewski
Modified: 2018-01-23 21:37 UTC (History)
2 users (show)

See Also:


Attachments
Test program which ends with kernel panic. (312 bytes, text/plain)
2017-02-16 22:48 UTC, Miłosz Kaniewski
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Miłosz Kaniewski 2017-02-16 22:48:41 UTC
Created attachment 180063 [details]
Test program which ends with kernel panic.

Hello,

when I try to use netmap with specified NIC queue (ie. when I use flag NR_REG_ONE_NIC) I get kernel panic:

panic: Assertion slot != NULL failed at /usr/src/sys/modules/cxgbe/if_cxgbe/../../../dev/cxgbe/t4_netmap.c:353
cpuid = 14
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0660f53000
vpanic() at vpanic+0x186/frame 0xfffffe0660f53080
kassert_panic() at kassert_panic+0x126/frame 0xfffffe0660f530f0
cxgbe_netmap_reg() at cxgbe_netmap_reg+0x8d8/frame 0xfffffe0660f531c0
netmap_hw_reg() at netmap_hw_reg+0x2c/frame 0xfffffe0660f531f0
netmap_do_regif() at netmap_do_regif+0x2cb/frame 0xfffffe0660f53230
netmap_ioctl() at netmap_ioctl+0xa57/frame 0xfffffe0660f53620
freebsd_netmap_ioctl() at freebsd_netmap_ioctl+0x3e/frame 0xfffffe0660f53650
devfs_ioctl() at devfs_ioctl+0xc3/frame 0xfffffe0660f536a0
VOP_IOCTL_APV() at VOP_IOCTL_APV+0xe0/frame 0xfffffe0660f536d0
vn_ioctl() at vn_ioctl+0x124/frame 0xfffffe0660f537d0
devfs_ioctl_f() at devfs_ioctl_f+0x1f/frame 0xfffffe0660f537f0
kern_ioctl() at kern_ioctl+0x2b0/frame 0xfffffe0660f53850
sys_ioctl() at sys_ioctl+0x13f/frame 0xfffffe0660f53930
amd64_syscall() at amd64_syscall+0x2f9/frame 0xfffffe0660f53ab0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0660f53ab0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x80097c97a, rsp = 0x7fffffffea88, rbp = 0x7fffffffeb20 ---
KDB: enter: panic

If the queue is not specified then everything works ok.

To repeat this error:
1. Run 'pkt-gen -i vcxl0-1' or
2. Run program netmap_test.c.

uname -a:
FreeBSD test0 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r313561: Fri Feb 10 20:18:01 UTC 2017     root@releng3.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

network card:
Chelsio T540-CR

/boot/loader.conf content:
hw.cxgbe.num_vis=2

root@freebsd:~ # ifconfig vcxl0
vcxl0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500        options=ec07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 00:07:43:31:cf:52
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet 10Gbase-SR <full-duplex>
        status: active
Comment 1 Miłosz Kaniewski 2017-02-20 12:08:50 UTC
I read a little bit of code and found out what may be the source of the problem:
1. When we initialize netmap device (using NIOCREGIF ioctl) a pair of RX/TX krings is created for each queue. If I declare to use only one queue then only one pair of krings get nr_pending_mode set to NKR_NETMAP_ON. In all other krings nr_pending_mode is set to NKR_NETMAP_OFF. 
2. Next cxgbe_netmap_on() function is called. Inside this function there are two loops (for_each_nm_{rt}xq) that iterate through every tx and rx queue (also through queues that I decided not to use).
3. On every iteration there is netmap_reset() function called and there is an assertion that it never returns NULL. However if there are some krings with nr_pending_mode set to NKR_NETMAP_OFF then netmap_reset() returns NULL. We have such krings so panic occurs.

I were able to resolve this problem by modifying for_each_nm_{rt}xq loop that it would iterate only through queues that have nr_pending_mode set to NKR_NETMAP_ON. But I am not really sure if this is a proper way to resolve this problem (for example it doesn't let me to do proper resources releasing in cxgbe_netmap_off()). Moreover after my modifications t4_config_rss_range() started to fail randomly (with EINVAL) and I have no idea why it is happening.
Comment 2 Navdeep Parhar freebsd_committer freebsd_triage 2017-02-21 20:09:55 UTC
MPASS(slot != NULL);    /* XXXNM: error check, not assert */

The comment for the assert indicates that the driver should treat this as a runtime error and not a catastrophe.  NULL slot means there's no memory available for rx buffers.  What should the driver do in this case?
Comment 3 Navdeep Parhar freebsd_committer freebsd_triage 2017-02-21 20:25:47 UTC
Vincenzo can probably tell us what the expected behavior is, but bugzilla wouldn't let me add him to the CC list.  What happens to the queues that do not have netmap enabled -- do they continue to work normally or not?  How does netmap ensure that it gets all the rx traffic (if it does that) and the non-netmap-enabled queues are quiesced?
Comment 4 commit-hook freebsd_committer freebsd_triage 2017-06-15 19:58:03 UTC
A commit references this bug:

Author: np
Date: Thu Jun 15 19:56:59 UTC 2017
New revision: 319986
URL: https://svnweb.freebsd.org/changeset/base/319986

Log:
  cxgbe(4):  Fix per-queue netmap operation.

  Do not attempt to initialize netmap queues that are already initialized
  or aren't supposed to be initialized.  Similarly, do not free queues
  that are not initialized or aren't supposed to be freed.

  PR:		217156
  Sponsored by:	Chelsio Communications

Changes:
  head/sys/dev/cxgbe/adapter.h
  head/sys/dev/cxgbe/t4_netmap.c
  head/sys/dev/cxgbe/t4_sge.c