Summary: | if_sfxge unstable causes panic at ifconfig sfxge0 up | ||
---|---|---|---|
Product: | Base System | Reporter: | nonesuch |
Component: | kern | Assignee: | Andrew Rybchenko,St.Petersburg Russia <arybchik> |
Status: | Closed FIXED | ||
Severity: | Affects Many People | CC: | amd64, arybchik |
Priority: | --- | Flags: | arybchik:
maintainer-feedback+
arybchik: mfc-stable10+ arybchik: mfc-stable9- |
Version: | 10.3-BETA2 | ||
Hardware: | amd64 | ||
OS: | Any |
Description
nonesuch
2016-03-24 19:41:43 UTC
here is the backtrace savecore: reboot after panic: assertion failed at /usr/src/sys/modules/sfxge/../../dev/sfxge/common/hunt_rx.c:751 savecore: writing core to ./vmcore.0 Loaded symbols for /boot/kernel/fdescfs.ko.symbols Reading symbols from /boot/kernel/sfxge.ko.symbols...done. Loaded symbols for /boot/kernel/sfxge.ko.symbols #0 doadump (textdump=<value optimized out>) at pcpu.h:219 219 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump (textdump=<value optimized out>) at pcpu.h:219 #1 0xffffffff80951152 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:486 #2 0xffffffff80951535 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:889 #3 0xffffffff809513c3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:818 #4 0xffffffff81e382c7 in ef10_rx_qcreate (enp=<value optimized out>, index=<value optimized out>, label=<value optimized out>, type=<value optimized out>, esmp=<value optimized out>, n=<value optimized out>, id=584, eep=<value optimized out>, erp=<value optimized out>) at /usr/src/sys/modules/sfxge/../../dev/sfxge/common/hunt_rx.c:751 #5 0xffffffff81e28a94 in efx_rx_qcreate (enp=0xfffff8024de45000, index=32, label=32, type=EFX_RXQ_TYPE_DEFAULT, esmp=0x0, n=0, id=<value optimized out>, eep=<value optimized out>, erpp=0x0) at /usr/src/sys/modules/sfxge/../../dev/sfxge/common/efx_rx.c:540 #6 0xffffffff81e1d1ff in sfxge_rx_start (sc=0xfffffe0026084000) at /usr/src/sys/modules/sfxge/../../dev/sfxge/sfxge_rx.c:1037 #7 0xffffffff81e189f8 in sfxge_start (sc=0xfffffe0026084000) at /usr/src/sys/modules/sfxge/../../dev/sfxge/sfxge.c:233 #8 0xffffffff81e185e1 in sfxge_if_ioctl (ifp=0xfffff8013312b800, command=<value optimized out>, data=<value optimized out>) at /usr/src/sys/modules/sfxge/../../dev/sfxge/sfxge.c:394 #9 0xffffffff80a1771f in ifioctl (so=<value optimized out>, cmd=<value optimized out>, data=0xfffffe201d7f28e0 "sfxge0", td=<value optimized out>) at /usr/src/sys/net/if.c:2403 #10 0xffffffff809a9005 in kern_ioctl (td=0xfffff8014844a4b0, fd=<value optimized out>, com=0) at file.h:321 #11 0xffffffff809a8d00 in sys_ioctl (td=0xfffff8014844a4b0, uap=0xfffffe201d7f2a40) at /usr/src/sys/kern/sys_generic.c:718 #12 0xffffffff80d56def in amd64_syscall (td=0xfffff8014844a4b0, traced=0) at subr_syscall.c:141 #13 0xffffffff80d3c05b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 #14 0x00000008013e9fca in ?? () Looking at the code I guess the system has more than 32 CPUs. Could you try to load the driver with limited number of RSS channels (most likely 32 will work) # kenv hw.sfxge.0.max_rss_channels=16 # kenv hw.sfxge.1.max_rss_channels=16 All Updating the loader to use hw.sfxge.0.max_rss_channels=16 hw.sfxge.1.max_rss_channels=16 As well as disabling HT in the bios fixed the issue. I propose the following "fix" 1. Update the man page to denote hw.sfxge.N.max_rss_channels The maximum number of allocated RSS channels for the Nth adapter. If set to 0 or unset, the number of channels is determined by the number of CPU cores. This does not scale beyond 32 cores. You need to manually set the hw.sfxge.N.max_rss_channels <=32 before loading driver in a box with more then 32 cores. Thanks a lot for the confirmation. I have a real fix which allows to scale to more than 32 CPUs. Under review at Solarflare. I hope to publish it this week. Sorry, for delay https://reviews.freebsd.org/D6121 A commit references this bug: Author: arybchik Date: Thu Apr 28 06:20:43 UTC 2016 New revision: 298735 URL: https://svnweb.freebsd.org/changeset/base/298735 Log: sfxge(4): do not use RxQ index as label Labels are limitted by 32 on EF10. It is not sufficient on powerful hosts. Since only one RxQ is running over each EvQ, zero label may be used. Reviewed by: gnn Sponsored by: Solarflare Communications, Inc. MFC after: 2 days PR: 208267 Differential Revision: https://reviews.freebsd.org/D6121 Changes: head/sys/dev/sfxge/sfxge_ev.c head/sys/dev/sfxge/sfxge_rx.c A commit references this bug: Author: arybchik Date: Sat Apr 30 06:35:20 UTC 2016 New revision: 298836 URL: https://svnweb.freebsd.org/changeset/base/298836 Log: MFC r298735 sfxge(4): do not use RxQ index as label Labels are limitted by 32 on EF10. It is not sufficient on powerful hosts. Since only one RxQ is running over each EvQ, zero label may be used. Reviewed by: gnn Sponsored by: Solarflare Communications, Inc. PR: 208267 Differential Revision: https://reviews.freebsd.org/D6121 Changes: _U stable/10/ stable/10/sys/dev/sfxge/sfxge_ev.c stable/10/sys/dev/sfxge/sfxge_rx.c Fixed in head and stable/10. Confirmed to be working no issues noted. sfxge0: Using MSI-X interrupts sfxge0: Ethernet address: 00:0f:53:35:fa:90 sfxge0: Solarflare Flareon Ultra 7000 Series 10G Adapter sfxge1: <Solarflare SFC9100 family> port 0x8000-0x80ff mem 0xc9000000-0xc97fffff,0xca000000-0xca003fff irq 68 at device 0.1 on pci131 sfxge1: Using MSI-X interrupts sfxge1: Ethernet address: 00:0f:53:35:fa:91 sfxge1: Solarflare Flareon Ultra 7000 Series 10G Adapter sfxge0: link state changed to UP sfxge0: promiscuous mode enabled sfxge0: promiscuous mode disabled dev.sfxge.1.vpd.SN: 7501013053711520471XXXXX dev.sfxge.1.vpd.EC: PCBR3:CCSA1 dev.sfxge.1.vpd.PN: SFN7x22F dev.sfxge.1.txq.33.stats.tx_netdown_drops: 0 dev.sfxge.1.txq.33.stats.tx_put_overflow: 0 dev.sfxge.1.txq.33.stats.tx_get_non_tcp_overflow: 0 dev.sfxge.1.txq.33.stats.tx_get_overflow: 0 dev.sfxge.1.txq.33.stats.tx_drops: 0 dev.sfxge.1.txq.33.stats.tx_collapses: 0 dev.sfxge.1.txq.33.stats.tso_pdrop_no_rsrc: 0 dev.sfxge.1.txq.33.stats.tso_pdrop_too_many: 0 dev.sfxge.1.txq.33.stats.tso_long_headers: 0 dev.sfxge.1.txq.33.stats.tso_packets: 0 dev.sfxge.1.txq.33.stats.tso_bursts: 0 dev.sfxge.1.txq.33.dpl.put_hiwat: 0 dev.sfxge.1.txq.33.dpl.get_hiwat: 0 dev.sfxge.1.txq.33.dpl.get_non_tcp_count: 0 dev.sfxge.1.txq.33.dpl.get_count: 0 Many thanks for the verification and confirmation |