Bug 215613 - [panic] if if_ixl due to NULL pointer dereference
Summary: [panic] if if_ixl due to NULL pointer dereference
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-net (Nobody)
URL:
Keywords: IntelNetworking
Depends on:
Blocks:
 
Reported: 2016-12-27 16:42 UTC by Andrey V. Elsukov
Modified: 2017-01-04 15:53 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrey V. Elsukov freebsd_committer freebsd_triage 2016-12-27 16:42:14 UTC
Sometimes the system panics just after reboot when it starts network activity.

# grep ixl /var/run/dmesg.boot
ixl0: <Intel(R) Ethernet Connection XL710/X722 Driver, Version - 1.6.6-k> mem 0xdc000000-0xdc7fffff,0xdd000000-0xdd007fff irq 42 at device 0.0 numa-domain 0 on pci7
ixl0: Using MSIX interrupts with 9 vectors
ixl0: fw 4.22.26225 api 1.2 nvm 4.24 etid 800013fd oem 0.0.0
ixl0: The driver for the device detected an older version of the NVM image than expected.
ixl0: PF-ID[0]: VFs 128, MSIX 129, VF MSIX 5, QPs 1536, I2C
ixl0: Allocating 8 queues for PF LAN VSI; 8 queues active
ixl0: Ethernet address: 68:05:ca:30:45:30
ixl0: PCI Express Bus: Speed 8.0GT/s Width x8
ixl0: SR-IOV ready
ixl0: netmap queues/slots: TX 8/1024, RX 8/1024
ixl0: link state changed to UP

----

Fatal trap 12: page fault while in kernel mode
cpuid = 21; apic id = 25
fault virtual address	= 0x64
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80b44d79
stack pointer	        = 0x28:0xfffffe1048a133b0
frame pointer	        = 0x28:0xfffffe1048a133d0
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 1159 (bird)

(kgdb) bt
#0  doadump (textdump=1218522560) at pcpu.h:222
#1  0xffffffff8038c596 in db_fncall (dummy1=<value optimized out>, dummy2=<value optimized out>, dummy3=<value optimized out>, dummy4=<value optimized out>)
    at /usr/src/sys/ddb/db_command.c:581
#2  0xffffffff8038c0f9 in db_command (cmd_table=<value optimized out>) at /usr/src/sys/ddb/db_command.c:453
#3  0xffffffff8038be54 in db_command_loop () at /usr/src/sys/ddb/db_command.c:506
#4  0xffffffff8038efbf in db_trap (type=<value optimized out>, code=<value optimized out>) at /usr/src/sys/ddb/db_main.c:248
#5  0xffffffff80b32f33 in kdb_trap (type=<value optimized out>, code=<value optimized out>, tf=<value optimized out>) at /usr/src/sys/kern/subr_kdb.c:654
#6  0xffffffff80fa25b1 in trap_fatal (frame=0xfffffe1048a132f0, eva=100) at /usr/src/sys/amd64/amd64/trap.c:796
#7  0xffffffff80fa27e3 in trap_pfault (frame=0xfffffe1048a132f0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:658
#8  0xffffffff80fa1de3 in trap (frame=0xfffffe1048a132f0) at /usr/src/sys/amd64/amd64/trap.c:421
#9  0xffffffff80f84191 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236
#10 0xffffffff80b44d79 in taskqueue_enqueue (queue=0x0, task=0xfffffe0001a0e0b0) at pcpu.h:222
#11 0xffffffff8103f1ef in ixl_mq_start (ifp=<value optimized out>, m=<value optimized out>) at /usr/src/sys/dev/ixl/ixl_txrx.c:135
#12 0xffffffff80c06894 in vlan_transmit (ifp=<value optimized out>, m=<value optimized out>) at /usr/src/sys/net/if_vlan.c:1116
#13 0xffffffff80bfc5fe in ether_output (ifp=<value optimized out>, m=<value optimized out>, dst=0xfffffe1048a13610, ro=<value optimized out>)
    at /usr/src/sys/net/if_ethersubr.c:424
#14 0xffffffff80c80a3f in ip_output (m=0xfffffe0001a0e0b0, opt=<value optimized out>, ro=<value optimized out>, flags=<value optimized out>, imo=0x0, 
    inp=<value optimized out>) at /usr/src/sys/netinet/ip_output.c:660
#15 0xffffffff80c84423 in rip_output (m=0xfffff803405eab00, so=<value optimized out>) at /usr/src/sys/netinet/raw_ip.c:538
#16 0xffffffff80b86757 in sosend_generic (so=<value optimized out>, addr=<value optimized out>, uio=<value optimized out>, top=<value optimized out>, 
    control=<value optimized out>, flags=<value optimized out>, td=<value optimized out>) at /usr/src/sys/kern/uipc_socket.c:1359
#17 0xffffffff80b8e4c3 in kern_sendit (td=<value optimized out>, s=<value optimized out>, mp=<value optimized out>, flags=0, control=<value optimized out>, 
    segflg=UIO_USERSPACE) at /usr/src/sys/kern/uipc_syscalls.c:811
#18 0xffffffff80b8e8cf in sendit (td=0xfffff802e58a8000, s=<value optimized out>, mp=0xfffffe1048a138d8, flags=<value optimized out>)
    at /usr/src/sys/kern/uipc_syscalls.c:736
#19 0xffffffff80b8e981 in sys_sendmsg (td=0xfffff802e58a8000, uap=0xfffffe1048a139d0) at /usr/src/sys/kern/uipc_syscalls.c:912
#20 0xffffffff80fa2f9e in amd64_syscall (td=<value optimized out>, traced=0) at subr_syscall.c:135
#21 0xffffffff80f8447b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396
#22 0x0000000800c2386a in ?? ()
Previous frame inner to this frame (corrupt stack?)

(kgdb) f 11
#11 0xffffffff8103f1ef in ixl_mq_start (ifp=<value optimized out>, m=<value optimized out>) at /usr/src/sys/dev/ixl/ixl_txrx.c:135
warning: Source file is more recent than executable.

135			taskqueue_enqueue(que->tq, &que->tx_task);
(kgdb) i lo
vsi = <value optimized out>
txr = (struct tx_ring *) 0xfffffe0001a0de68
(kgdb) p *txr
$1 = {que = 0xfffffe0001a0de38, mtx = {lock_object = {lo_name = 0xfffffe0001a0df10 "ixl0:tx(5)", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, 
  tail = 1081364, base = 0xfffffe1045c49000, dma = {va = 0xfffffe1045c49000, pa = 214208512, tag = 0xfffff8000ca4d900, map = 0x0, seg = {ds_addr = 0, ds_len = 0}, 
    size = 16512, nseg = 1, flags = 0}, next_avail = 13, next_to_clean = 0, atr_rate = 0, atr_count = 0, itr = 122, latency = 1, buffers = 0xfffffe0001abf000, 
  avail = 1011, cmd = 0, tx_tag = 0xfffff8000ca4d800, tso_tag = 0xfffff8000ca4d700, mtx_name = 0xfffffe0001a0df10 "ixl0:tx(5)", br = 0xfffffe0001ac7000, packets = 0, 
  bytes = 0, tx_bytes = 0, no_desc = 0, total_packets = 8}
(kgdb) p *txr->que
$3 = {vsi = 0xfffffe000168e730, me = 5, msix = 0, eims = 0, res = 0x0, tag = 0x0, num_desc = 1024, busy = 1, txr = {que = 0xfffffe0001a0de38, mtx = {lock_object = {
        lo_name = 0xfffffe0001a0df10 "ixl0:tx(5)", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, tail = 1081364, base = 0xfffffe1045c49000, dma = {
      va = 0xfffffe1045c49000, pa = 214208512, tag = 0xfffff8000ca4d900, map = 0x0, seg = {ds_addr = 0, ds_len = 0}, size = 16512, nseg = 1, flags = 0}, next_avail = 13, 
    next_to_clean = 0, atr_rate = 0, atr_count = 0, itr = 122, latency = 1, buffers = 0xfffffe0001abf000, avail = 1011, cmd = 0, tx_tag = 0xfffff8000ca4d800, 
    tso_tag = 0xfffff8000ca4d700, mtx_name = 0xfffffe0001a0df10 "ixl0:tx(5)", br = 0xfffffe0001ac7000, packets = 0, bytes = 0, tx_bytes = 0, no_desc = 0, 
    total_packets = 8}, rxr = {que = 0xfffffe0001a0de38, mtx = {lock_object = {lo_name = 0xfffffe0001a0e02c "ixl0:rx(5)", lo_flags = 16973824, lo_data = 0, 
        lo_witness = 0x0}, mtx_lock = 4}, base = 0xfffffe1045c4e000, dma = {va = 0xfffffe1045c4e000, pa = 214228992, tag = 0xfffff8000ca4d600, map = 0x0, seg = {
        ds_addr = 0, ds_len = 0}, size = 32768, nseg = 1, flags = 0}, lro = {ifp = 0xfffff8000c7ad800, lro_mbuf_data = 0xfffff801d814f000, lro_queued = 0, 
      lro_flushed = 0, lro_bad_csum = 0, lro_cnt = 8, lro_mbuf_count = 0, lro_mbuf_max = 0, lro_ackcnt_lim = 65535, lro_length_lim = 65535, lro_hashsz = 1, 
      lro_hash = 0xfffff8020981bf00, lro_active = {lh_first = 0x0}, lro_free = {lh_first = 0xfffff801d814f3f0}}, lro_enabled = false, hdr_split = false, discard = false, 
    next_refresh = 0, next_check = 0, itr = 62, latency = 1, mtx_name = 0xfffffe0001a0e02c "ixl0:rx(5)", buffers = 0xfffffe0001ad7000, mbuf_sz = 4096, tail = 1212436, 
    htag = 0xfffff8000ca4d500, ptag = 0xfffff8000ca4d400, packets = 0, bytes = 0, split = 0, rx_packets = 0, rx_bytes = 0, desc_errs = 0, not_done = 0}, task = {
    ta_link = {stqe_next = 0x0}, ta_pending = 0, ta_priority = 0, ta_func = 0, ta_context = 0x0}, tx_task = {ta_link = {stqe_next = 0x0}, ta_pending = 0, 
    ta_priority = 0, ta_func = 0, ta_context = 0x0}, tq = 0x0, irqs = 0, tso = 0, mbuf_defrag_failed = 0, mbuf_hdr_failed = 0, mbuf_pkt_failed = 0, tx_dmamap_failed = 0, 
  dropped_pkts = 0}
(kgdb) p txr->que->tq
$4 = (struct taskqueue *) 0x0
(kgdb) p &txr->que->tq->tq_spin
$5 = (int *) 0x64


It looks like ixl_mq_start() somehow was called when queues are not yet initialized (or already freed).
Comment 1 Andrey V. Elsukov freebsd_committer freebsd_triage 2016-12-27 16:44:16 UTC
Another one:

Unread portion of the kernel message buffer:
frame pointer	        = 0x28:0xfffffe1048520130
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 926 (bird6)

(kgdb) bt
#0  doadump (textdump=1213332256) at pcpu.h:222
#1  0xffffffff8038c596 in db_fncall (dummy1=<value optimized out>, dummy2=<value optimized out>, dummy3=<value optimized out>, dummy4=<value optimized out>)
    at /usr/src/sys/ddb/db_command.c:581
#2  0xffffffff8038c0f9 in db_command (cmd_table=<value optimized out>) at /usr/src/sys/ddb/db_command.c:453
#3  0xffffffff8038be54 in db_command_loop () at /usr/src/sys/ddb/db_command.c:506
#4  0xffffffff8038efbf in db_trap (type=<value optimized out>, code=<value optimized out>) at /usr/src/sys/ddb/db_main.c:248
#5  0xffffffff80b32f33 in kdb_trap (type=<value optimized out>, code=<value optimized out>, tf=<value optimized out>) at /usr/src/sys/kern/subr_kdb.c:654
#6  0xffffffff80fa25b1 in trap_fatal (frame=0xfffffe1048520050, eva=100) at /usr/src/sys/amd64/amd64/trap.c:796
#7  0xffffffff80fa27e3 in trap_pfault (frame=0xfffffe1048520050, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:658
#8  0xffffffff80fa1de3 in trap (frame=0xfffffe1048520050) at /usr/src/sys/amd64/amd64/trap.c:421
#9  0xffffffff80f84191 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236
#10 0xffffffff80b44d79 in taskqueue_enqueue (queue=0x0, task=0xfffffe0001a0e660) at pcpu.h:222
#11 0xffffffff8103f1ef in ixl_mq_start (ifp=<value optimized out>, m=<value optimized out>) at /usr/src/sys/dev/ixl/ixl_txrx.c:135
#12 0xffffffff80c06894 in vlan_transmit (ifp=<value optimized out>, m=<value optimized out>) at /usr/src/sys/net/if_vlan.c:1116
#13 0xffffffff80bfc5fe in ether_output (ifp=<value optimized out>, m=<value optimized out>, dst=0xfffffe1048520420, ro=<value optimized out>)
    at /usr/src/sys/net/if_ethersubr.c:424
#14 0xffffffff80d49b51 in ip6_output (m0=<value optimized out>, opt=<value optimized out>, ro=0xfffffe1048520408, flags=<value optimized out>, im6o=0xfffff801d8a9e100, 
    ifpp=0xfffffe1048520590, inp=<value optimized out>) at /usr/src/sys/netinet6/ip6_output.c:946
#15 0xffffffff80d5e7cf in rip6_output (m=<value optimized out>, so=<value optimized out>) at /usr/src/sys/netinet6/raw_ip6.c:536
#16 0xffffffff80d5fa49 in rip6_send (so=0xfffff801e8cf2000, flags=<value optimized out>, m=0xfffff802c4cea600, nam=<value optimized out>, control=0xfffff802c483e700, 
    td=0xf) at /usr/src/sys/netinet6/raw_ip6.c:888
#17 0xffffffff80b86757 in sosend_generic (so=<value optimized out>, addr=<value optimized out>, uio=<value optimized out>, top=<value optimized out>, 
    control=<value optimized out>, flags=<value optimized out>, td=<value optimized out>) at /usr/src/sys/kern/uipc_socket.c:1359
#18 0xffffffff80b8e4c3 in kern_sendit (td=<value optimized out>, s=<value optimized out>, mp=<value optimized out>, flags=0, control=<value optimized out>, 
    segflg=UIO_USERSPACE) at /usr/src/sys/kern/uipc_syscalls.c:811
#19 0xffffffff80b8e8cf in sendit (td=0xfffff801d8a75000, s=<value optimized out>, mp=0xfffffe10485208d8, flags=<value optimized out>)
    at /usr/src/sys/kern/uipc_syscalls.c:736
#20 0xffffffff80b8e981 in sys_sendmsg (td=0xfffff801d8a75000, uap=0xfffffe10485209d0) at /usr/src/sys/kern/uipc_syscalls.c:912
#21 0xffffffff80fa2f9e in amd64_syscall (td=<value optimized out>, traced=0) at subr_syscall.c:135
#22 0xffffffff80f8447b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396
#23 0x0000000800c3286a in ?? ()
Previous frame inner to this frame (corrupt stack?)

(kgdb) f 11
#11 0xffffffff8103f1ef in ixl_mq_start (ifp=<value optimized out>, m=<value optimized out>) at /usr/src/sys/dev/ixl/ixl_txrx.c:135
warning: Source file is more recent than executable.

135			taskqueue_enqueue(que->tq, &que->tx_task);
(kgdb) i lo
vsi = <value optimized out>
txr = (struct tx_ring *) 0xfffffe0001a0e418
(kgdb) p *txr
$1 = {que = 0xfffffe0001a0e3e8, mtx = {lock_object = {lo_name = 0xfffffe0001a0e4c0 "ixl0:tx(7)", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, 
  tail = 1081372, base = 0xfffffe1045c5b000, dma = {va = 0xfffffe1045c5b000, pa = 214282240, tag = 0xfffff8000ca4cc00, map = 0x0, seg = {ds_addr = 0, ds_len = 0}, 
    size = 16512, nseg = 1, flags = 0}, next_avail = 38, next_to_clean = 0, atr_rate = 0, atr_count = 0, itr = 122, latency = 1, buffers = 0xfffffe0001b0f000, 
  avail = 986, cmd = 0, tx_tag = 0xfffff8000ca4cb00, tso_tag = 0xfffff8000ca4ca00, mtx_name = 0xfffffe0001a0e4c0 "ixl0:tx(7)", br = 0xfffffe0001b17000, packets = 0, 
  bytes = 0, tx_bytes = 0, no_desc = 0, total_packets = 38}
(kgdb) p *txr->que
$2 = {vsi = 0xfffffe000168e730, me = 7, msix = 0, eims = 0, res = 0x0, tag = 0x0, num_desc = 1024, busy = 1, txr = {que = 0xfffffe0001a0e3e8, mtx = {lock_object = {
        lo_name = 0xfffffe0001a0e4c0 "ixl0:tx(7)", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, tail = 1081372, base = 0xfffffe1045c5b000, dma = {
      va = 0xfffffe1045c5b000, pa = 214282240, tag = 0xfffff8000ca4cc00, map = 0x0, seg = {ds_addr = 0, ds_len = 0}, size = 16512, nseg = 1, flags = 0}, next_avail = 38, 
    next_to_clean = 0, atr_rate = 0, atr_count = 0, itr = 122, latency = 1, buffers = 0xfffffe0001b0f000, avail = 986, cmd = 0, tx_tag = 0xfffff8000ca4cb00, 
    tso_tag = 0xfffff8000ca4ca00, mtx_name = 0xfffffe0001a0e4c0 "ixl0:tx(7)", br = 0xfffffe0001b17000, packets = 0, bytes = 0, tx_bytes = 0, no_desc = 0, 
    total_packets = 38}, rxr = {que = 0xfffffe0001a0e3e8, mtx = {lock_object = {lo_name = 0xfffffe0001a0e5dc "ixl0:rx(7)", lo_flags = 16973824, lo_data = 0, 
        lo_witness = 0x0}, mtx_lock = 4}, base = 0xfffffe1045c60000, dma = {va = 0xfffffe1045c60000, pa = 214302720, tag = 0xfffff8000ca4c900, map = 0x0, seg = {
        ds_addr = 0, ds_len = 0}, size = 32768, nseg = 1, flags = 0}, lro = {ifp = 0xfffff8000c7ad800, lro_mbuf_data = 0xfffff801d818a800, lro_queued = 0, 
      lro_flushed = 0, lro_bad_csum = 0, lro_cnt = 8, lro_mbuf_count = 0, lro_mbuf_max = 0, lro_ackcnt_lim = 65535, lro_length_lim = 65535, lro_hashsz = 1, 
      lro_hash = 0xfffff801d8cdf240, lro_active = {lh_first = 0x0}, lro_free = {lh_first = 0xfffff801d818abf0}}, lro_enabled = false, hdr_split = false, discard = false, 
    next_refresh = 0, next_check = 0, itr = 62, latency = 1, mtx_name = 0xfffffe0001a0e5dc "ixl0:rx(7)", buffers = 0xfffffe0001b27000, mbuf_sz = 4096, tail = 1212444, 
    htag = 0xfffff8000ca4c800, ptag = 0xfffff8000ca4c700, packets = 0, bytes = 0, split = 0, rx_packets = 0, rx_bytes = 0, desc_errs = 0, not_done = 0}, task = {
    ta_link = {stqe_next = 0x0}, ta_pending = 0, ta_priority = 0, ta_func = 0, ta_context = 0x0}, tx_task = {ta_link = {stqe_next = 0x0}, ta_pending = 0, 
    ta_priority = 0, ta_func = 0, ta_context = 0x0}, tq = 0x0, irqs = 0, tso = 0, mbuf_defrag_failed = 0, mbuf_hdr_failed = 0, mbuf_pkt_failed = 0, tx_dmamap_failed = 0, 
  dropped_pkts = 0}
Comment 2 Andrey V. Elsukov freebsd_committer freebsd_triage 2016-12-29 16:09:09 UTC
I looked at the latest driver code (1.6.10), and I think this problem was fixed by moving code that initializes/frees queues from ixl_init()/ixl_stop() into ixl_attach()/ixl_detach().
Comment 3 Sean Bruno freebsd_committer freebsd_triage 2017-01-04 15:53:12 UTC
Andrey:

Let's close this for now if you don't mind.  Reopen if this comes up again.