Created attachment 180063 [details] Test program which ends with kernel panic. Hello, when I try to use netmap with specified NIC queue (ie. when I use flag NR_REG_ONE_NIC) I get kernel panic: panic: Assertion slot != NULL failed at /usr/src/sys/modules/cxgbe/if_cxgbe/../../../dev/cxgbe/t4_netmap.c:353 cpuid = 14 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0660f53000 vpanic() at vpanic+0x186/frame 0xfffffe0660f53080 kassert_panic() at kassert_panic+0x126/frame 0xfffffe0660f530f0 cxgbe_netmap_reg() at cxgbe_netmap_reg+0x8d8/frame 0xfffffe0660f531c0 netmap_hw_reg() at netmap_hw_reg+0x2c/frame 0xfffffe0660f531f0 netmap_do_regif() at netmap_do_regif+0x2cb/frame 0xfffffe0660f53230 netmap_ioctl() at netmap_ioctl+0xa57/frame 0xfffffe0660f53620 freebsd_netmap_ioctl() at freebsd_netmap_ioctl+0x3e/frame 0xfffffe0660f53650 devfs_ioctl() at devfs_ioctl+0xc3/frame 0xfffffe0660f536a0 VOP_IOCTL_APV() at VOP_IOCTL_APV+0xe0/frame 0xfffffe0660f536d0 vn_ioctl() at vn_ioctl+0x124/frame 0xfffffe0660f537d0 devfs_ioctl_f() at devfs_ioctl_f+0x1f/frame 0xfffffe0660f537f0 kern_ioctl() at kern_ioctl+0x2b0/frame 0xfffffe0660f53850 sys_ioctl() at sys_ioctl+0x13f/frame 0xfffffe0660f53930 amd64_syscall() at amd64_syscall+0x2f9/frame 0xfffffe0660f53ab0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0660f53ab0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x80097c97a, rsp = 0x7fffffffea88, rbp = 0x7fffffffeb20 --- KDB: enter: panic If the queue is not specified then everything works ok. To repeat this error: 1. Run 'pkt-gen -i vcxl0-1' or 2. Run program netmap_test.c. uname -a: FreeBSD test0 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r313561: Fri Feb 10 20:18:01 UTC 2017 root@releng3.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 network card: Chelsio T540-CR /boot/loader.conf content: hw.cxgbe.num_vis=2 root@freebsd:~ # ifconfig vcxl0 vcxl0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=ec07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> ether 00:07:43:31:cf:52 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet 10Gbase-SR <full-duplex> status: active
I read a little bit of code and found out what may be the source of the problem: 1. When we initialize netmap device (using NIOCREGIF ioctl) a pair of RX/TX krings is created for each queue. If I declare to use only one queue then only one pair of krings get nr_pending_mode set to NKR_NETMAP_ON. In all other krings nr_pending_mode is set to NKR_NETMAP_OFF. 2. Next cxgbe_netmap_on() function is called. Inside this function there are two loops (for_each_nm_{rt}xq) that iterate through every tx and rx queue (also through queues that I decided not to use). 3. On every iteration there is netmap_reset() function called and there is an assertion that it never returns NULL. However if there are some krings with nr_pending_mode set to NKR_NETMAP_OFF then netmap_reset() returns NULL. We have such krings so panic occurs. I were able to resolve this problem by modifying for_each_nm_{rt}xq loop that it would iterate only through queues that have nr_pending_mode set to NKR_NETMAP_ON. But I am not really sure if this is a proper way to resolve this problem (for example it doesn't let me to do proper resources releasing in cxgbe_netmap_off()). Moreover after my modifications t4_config_rss_range() started to fail randomly (with EINVAL) and I have no idea why it is happening.
MPASS(slot != NULL); /* XXXNM: error check, not assert */ The comment for the assert indicates that the driver should treat this as a runtime error and not a catastrophe. NULL slot means there's no memory available for rx buffers. What should the driver do in this case?
Vincenzo can probably tell us what the expected behavior is, but bugzilla wouldn't let me add him to the CC list. What happens to the queues that do not have netmap enabled -- do they continue to work normally or not? How does netmap ensure that it gets all the rx traffic (if it does that) and the non-netmap-enabled queues are quiesced?
A commit references this bug: Author: np Date: Thu Jun 15 19:56:59 UTC 2017 New revision: 319986 URL: https://svnweb.freebsd.org/changeset/base/319986 Log: cxgbe(4): Fix per-queue netmap operation. Do not attempt to initialize netmap queues that are already initialized or aren't supposed to be initialized. Similarly, do not free queues that are not initialized or aren't supposed to be freed. PR: 217156 Sponsored by: Chelsio Communications Changes: head/sys/dev/cxgbe/adapter.h head/sys/dev/cxgbe/t4_netmap.c head/sys/dev/cxgbe/t4_sge.c