Kernel page fault due to null pointer passed from tcp_lro_flush to ether_input: (kgdb) where #0 doadump () at pcpu.h:195 #1 0xffffffff80281888 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #2 0xffffffff80281cec in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:574 #3 0xffffffff803c91c3 in trap_fatal (frame=0xc, eva=Variable "eva" is not available. ) at /usr/src/sys/amd64/amd64/trap.c:764 #4 0xffffffff803c95a4 in trap_pfault (frame=0xfffffffface6f9f0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:680 #5 0xffffffff803c9efa in trap (frame=0xfffffffface6f9f0) at /usr/src/sys/amd64/amd64/trap.c:449 #6 0xffffffff803aee3e in calltrap () at /usr/src/sys/amd64/amd64/exception.S:209 #7 0xffffffff8031eadf in ether_input (ifp=0xffffff00010bb800, m=0x0) at /usr/src/sys/net/if_ethersubr.c:531 #8 0xffffffff8034779b in tcp_lro_flush (cntl=0xffffff000120a258, lro=0xffffff00036e4000) at /usr/src/sys/netinet/tcp_lro.c:168 #9 0xffffffff801c4d07 in igb_rxeof (rxr=0xffffff000120a258, count=73) at /usr/src/sys/dev/e1000/if_igb.c:4018 #10 0xffffffff801c4ffb in igb_handle_rx (context=0xffffff000120a200, pending=Variable "pending" is not available. ) at /usr/src/sys/dev/e1000/if_igb.c:1337 #11 0xffffffff802b796d in taskqueue_run (queue=0xffffff0002400800) at /usr/src/sys/kern/subr_taskqueue.c:282 #12 0xffffffff802b7c32 in taskqueue_thread_loop (arg=Variable "arg" is not available. ) at /usr/src/sys/kern/subr_taskqueue.c:401 #13 0xffffffff8025ec2f in fork_exit ( callout=0xffffffff802b7bc0 <taskqueue_thread_loop>, arg=0xffffff00011584d8, frame=0xfffffffface6fc80) at /usr/src/sys/kern/kern_fork.c:804 #14 0xffffffff803af20e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:455 [...] #8 0xffffffff8034779b in tcp_lro_flush (cntl=0xffffff000120a258, lro=0xffffff00036e4000) at /usr/src/sys/netinet/tcp_lro.c:168 168 (*ifp->if_input)(cntl->ifp, lro->m_head); (kgdb) print lro $1 = (struct lro_entry *) 0xffffff00036e4000 (kgdb) print *lro $2 = {next = {sle_next = 0xffffff00036e3c80}, m_head = 0x0, m_tail = 0xffffff0003a96900, timestamp = 0, ip = 0xffffff0003aac810, tsval = 87166632, tsecr = 1844070041, source_ip = 4124597842, dest_ip = 4107820626, next_seq = 2241419788, ack_seq = 1871884633, len = 122, data_csum = 53193, window = 22336, source_port = 24564, dest_port = 14357, append_cnt = 0, mss = 56} Note that m_head == NULL, hence the crash. How-To-Repeat: I got this from running PostgreSQL's "pgbench" benchmark with 100 concurrent connections from a remote host (over a gigE network). This is a request/response workload with relatively small requests and responses; the crash occurred after several minutes of load. The server side was the one that crashed. Repeating the same workload (and heavier versions of it) with hw.igb.enable_lro=0 did not produce any crashes.
Responsible Changed From-To: freebsd-bugs->freebsd-net Over to maintainer(s).
Responsible Changed From-To: freebsd-net->jfv Over to maintainer.
(In reply to andrew from comment #0) It appears from the date stamps that there was LRO updates here that may have not been tested, can you retest with 10.2r ? r182416 | jfv | 2008-08-28 15:28:28 -0700 (Thu, 28 Aug 2008) | 19 lines Update to igb driver: - changes in support of the VLAN filter fix to 126850 - removal of a bunch of legacy code that was cruft, if not possibly harmful. - removal of POLLING from this driver, with multiqueue and MSIX it just makes no sense here. - Fix an LRO bug that I've been working on internally, intermittent panics under stress, the problem was releasing the RX ring lock before the LRO flushing. - Following the above fix I now enable LRO by default - For performance reasons increase the default number of RX queues to 4. - Add AIM - "Adaptive Interrupt Moderation", a fancy way of saying that the EITR value is dynamically changed based on the size of packets in the last interrupt interval. - Much goodness to try, enjoy!!
Reassign to erj@ for triage. To submitter: is this issue still relevant?
Just noticed this was still open. Sorry, I no longer have access to this hardware; the issue is no longer relevant to me.
Submitter no longer has this hardware.