Bug 128840 - [igb] page fault under load with igb/LRO
Summary: [igb] page fault under load with igb/LRO
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 7.1-PRERELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: Eric Joyner
URL:
Keywords: IntelNetworking
Depends on:
Blocks:
 
Reported: 2008-11-13 15:10 UTC by andrew
Modified: 2017-02-28 11:13 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description andrew 2008-11-13 15:10:01 UTC
Kernel page fault due to null pointer passed from tcp_lro_flush to ether_input:

(kgdb) where
#0  doadump () at pcpu.h:195
#1  0xffffffff80281888 in boot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:418
#2  0xffffffff80281cec in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0xffffffff803c91c3 in trap_fatal (frame=0xc, eva=Variable "eva" is not available.
)
    at /usr/src/sys/amd64/amd64/trap.c:764
#4  0xffffffff803c95a4 in trap_pfault (frame=0xfffffffface6f9f0, usermode=0)
    at /usr/src/sys/amd64/amd64/trap.c:680
#5  0xffffffff803c9efa in trap (frame=0xfffffffface6f9f0)
    at /usr/src/sys/amd64/amd64/trap.c:449
#6  0xffffffff803aee3e in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:209
#7  0xffffffff8031eadf in ether_input (ifp=0xffffff00010bb800, m=0x0)
    at /usr/src/sys/net/if_ethersubr.c:531
#8  0xffffffff8034779b in tcp_lro_flush (cntl=0xffffff000120a258, 
    lro=0xffffff00036e4000) at /usr/src/sys/netinet/tcp_lro.c:168
#9  0xffffffff801c4d07 in igb_rxeof (rxr=0xffffff000120a258, count=73)
    at /usr/src/sys/dev/e1000/if_igb.c:4018
#10 0xffffffff801c4ffb in igb_handle_rx (context=0xffffff000120a200, pending=Variable "pending" is not available.
)
    at /usr/src/sys/dev/e1000/if_igb.c:1337
#11 0xffffffff802b796d in taskqueue_run (queue=0xffffff0002400800)
    at /usr/src/sys/kern/subr_taskqueue.c:282
#12 0xffffffff802b7c32 in taskqueue_thread_loop (arg=Variable "arg" is not available.
)
    at /usr/src/sys/kern/subr_taskqueue.c:401
#13 0xffffffff8025ec2f in fork_exit (
    callout=0xffffffff802b7bc0 <taskqueue_thread_loop>, 
    arg=0xffffff00011584d8, frame=0xfffffffface6fc80)
    at /usr/src/sys/kern/kern_fork.c:804
#14 0xffffffff803af20e in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:455

[...]

#8  0xffffffff8034779b in tcp_lro_flush (cntl=0xffffff000120a258, 
    lro=0xffffff00036e4000) at /usr/src/sys/netinet/tcp_lro.c:168
168             (*ifp->if_input)(cntl->ifp, lro->m_head);
(kgdb) print lro
$1 = (struct lro_entry *) 0xffffff00036e4000
(kgdb) print *lro
$2 = {next = {sle_next = 0xffffff00036e3c80}, m_head = 0x0, 
  m_tail = 0xffffff0003a96900, timestamp = 0, ip = 0xffffff0003aac810, 
  tsval = 87166632, tsecr = 1844070041, source_ip = 4124597842, 
  dest_ip = 4107820626, next_seq = 2241419788, ack_seq = 1871884633, 
  len = 122, data_csum = 53193, window = 22336, source_port = 24564, 
  dest_port = 14357, append_cnt = 0, mss = 56}

Note that m_head == NULL, hence the crash.

How-To-Repeat: I got this from running PostgreSQL's "pgbench" benchmark with 100
concurrent connections from a remote host (over a gigE network). This is
a request/response workload with relatively small requests and responses;
the crash occurred after several minutes of load. The server side was the
one that crashed.

Repeating the same workload (and heavier versions of it) with
hw.igb.enable_lro=0 did not produce any crashes.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2008-11-15 11:46:24 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-net

Over to maintainer(s).
Comment 2 Andre Oppermann freebsd_committer 2010-08-23 18:58:01 UTC
Responsible Changed
From-To: freebsd-net->jfv

Over to maintainer.
Comment 3 Sean Bruno freebsd_committer 2015-08-04 15:21:38 UTC
(In reply to andrew from comment #0)
It appears from the date stamps that there was LRO updates here that may have not been tested, can you retest with 10.2r ?

r182416 | jfv | 2008-08-28 15:28:28 -0700 (Thu, 28 Aug 2008) | 19 lines

Update to igb driver:

        - changes in support of the VLAN filter fix to 126850
        - removal of a bunch of legacy code that was cruft, if not
          possibly harmful.
        - removal of POLLING from this driver, with multiqueue and
           MSIX it just makes no sense here.
        - Fix an LRO bug that I've been working on internally, intermittent
          panics under stress, the problem was releasing the RX ring lock
          before the LRO flushing.
        - Following the above fix I now enable LRO by default
        - For performance reasons increase the default number of RX queues
          to 4.
        - Add AIM - "Adaptive Interrupt Moderation", a fancy way of saying
          that the EITR value is dynamically changed based on the size of
          packets in the last interrupt interval.

        - Much goodness to try, enjoy!!
Comment 4 Mark Linimon freebsd_committer freebsd_triage 2015-11-12 07:43:00 UTC
Reassign to erj@ for triage.  To submitter: is this issue still relevant?
Comment 5 andrew 2017-02-28 11:09:03 UTC
Just noticed this was still open. Sorry, I no longer have access to this hardware; the issue is no longer relevant to me.
Comment 6 Mark Linimon freebsd_committer freebsd_triage 2017-02-28 11:13:50 UTC
Submitter no longer has this hardware.