It's GENERIC w/ : options KDB options DDB options BREAK_TO_DEBUGGER options IPFIREWALL options IPFIREWALL_FORWARD options IPFIREWALL_VERBOSE options IPFIREWALL_VERBOSE_LIMIT=500 options IPFIREWALL_DEFAULT_TO_ACCEPT options LIBALIAS options IPFIREWALL_NAT options QUOTA device carp There are 3 crashes seen between February and March on different boxes but identical h/w. Backtrace of crashed proc is the same. db> bt Tracing pid 12 tid 100039 td 0xffffff00029a23e0 propagate_priority() at propagate_priority+0x72 turnstile_wait() at turnstile_wait+0x1aa _rw_wlock_hard() at _rw_wlock_hard+0xfa in_lltable_lookup() at in_lltable_lookup+0x12b arpintr() at arpintr+0x9d6 netisr_dispatch_src() at netisr_dispatch_src+0x7e ether_demux() at ether_demux+0x14d ether_input() at ether_input+0x17b ether_demux() at ether_demux+0x6f ether_input() at ether_input+0x17b bce_intr() at bce_intr+0x3b0 intr_event_execute_handlers() at intr_event_execute_handlers+0xfd ithread_loop() at ithread_loop+0x8e fork_exit() at fork_exit+0x118 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff82b155dd30, rbp = 0 --- Other ddb data available per each crash (attached in shar). Fix: Patch attached with submission follows: How-To-Repeat: Happens by accident on a moderately loaded box. Usual numbers: 2799 processes:12 running, 2750 sleeping, 37 zombie packets errs idrops bytes packets errs bytes colls 1723 0 0 198578 2166 0 2415553 0 interrupt total rate irq4: uart0 309128 6 irq15: ata1 35 0 irq17: aac0 13404675 293 cpu0: timer 90914103 1993 irq256: bce0 2914454 63 irq257: bce1 109935240 2410 cpu1: timer 90974578 1994 cpu3: timer 91076473 1997 cpu2: timer 91059038 1996 cpu4: timer 90930409 1993 cpu5: timer 91033366 1996 cpu6: timer 91085723 1997 cpu7: timer 91111107 1997 Total 854748329 18743
Responsible Changed From-To: freebsd-bugs->freebsd-net Over to maintainer(s).
State Changed From-To: open->feedback I think it's not a bug of bce(4) but arp(4) and I guess it was fixed in 8.2-RELEASE. Can you reproduce the issue on 8.2-RELEASE or latest stable/8?
Responsible Changed From-To: freebsd-net->yongari Grab.
I'm sorry I have no much 8.2 boxes in production use (yet). The crash is occasional. I cannot reproduce it reliably: I have 3 crashes for a month among ~100 boxes. -- wbr, pluknet
On Mon, Apr 11, 2011 at 08:00:23AM +0000, Sergey Kandaurov wrote: > The following reply was made to PR kern/156026; it has been noted by GNATS. > > From: Sergey Kandaurov <pluknet@gmail.com> > To: bug-followup@FreeBSD.org, pluknet@gmail.com > Cc: > Subject: Re: kern/156026: bce panic arpintr ->in_lltable_lookup 8.1 bce 4 crash > Date: Mon, 11 Apr 2011 11:57:55 +0400 > > I'm sorry I have no much 8.2 boxes in production use (yet). > The crash is occasional. I cannot reproduce it reliably: > I have 3 crashes for a month among ~100 boxes. > Ok, please let me know when you encounter the issue on 8.2-RELEASE. I think there were a large set of change of arp(4) in stable/8 after the release of 8.1-RELEASE. Qing Li may know better whether this is really one of them(CCed). Qing, could you look over the issue?
This bug definitely still exists in 8.2-RELEASE -- it's currently the #1 panic on FreeBSD/EC2. -- Colin Percival Security Officer, FreeBSD | freebsd.org | The power to serve Founder / author, Tarsnap | tarsnap.com | Online backups for the truly paranoid
On Tue, May 10, 2011 at 05:40:10AM +0000, Colin Percival wrote: > The following reply was made to PR kern/156026; it has been noted by GNATS. > > From: Colin Percival <cperciva@freebsd.org> > To: bug-followup@FreeBSD.org, pluknet@gmail.com > Cc: > Subject: Re: kern/156026: [bce] [panic] arpintr()->in_lltable_lookup() > 8.1 bce(4) crash > Date: Mon, 09 May 2011 22:37:44 -0700 > > This bug definitely still exists in 8.2-RELEASE -- it's currently the #1 panic > on FreeBSD/EC2. Is there easy way to reproduce the issue? I have a quad-port bce(4) controller but I didn't see any issues. Would you show me dmesg and pciconf -lcbv output get more details on controller/firmware revision?
On 05/10/11 10:06, YongHyeon PYUN wrote: > On Tue, May 10, 2011 at 05:40:10AM +0000, Colin Percival wrote: >> This bug definitely still exists in 8.2-RELEASE -- it's currently the #1 panic >> on FreeBSD/EC2. > > Is there easy way to reproduce the issue? I have a quad-port bce(4) > controller but I didn't see any issues. > Would you show me dmesg and pciconf -lcbv output get more details > on controller/firmware revision? I don't have any easy way to reproduce this, but I can say that it does not require bce, since EC2 doesn't have bce -- the hardware there is the Xen xn virtual interface. So I think it's safe to say that this is a bug in the arp/lltable/etc code, not in any network driver. -- Colin Percival Security Officer, FreeBSD | freebsd.org | The power to serve Founder / author, Tarsnap | tarsnap.com | Online backups for the truly paranoid
State Changed From-To: feedback->open Ok, thanks for confirming this. I think it would be better to give it to our arp guru, Qing Li. Qing, would you take a look this issue? It seems it's not device driver related issue. If you have no time or interests to fix it in near future, please assign it back to freebsd-net@.
Responsible Changed From-To: yongari->qingli Ok, thanks for confirming this. I think it would be better to give it to our arp guru, Qing Li. Qing, would you take a look this issue? It seems it's not device driver related issue. If you have no time or interests to fix it in near future, please assign it back to freebsd-net@. Thanks.
I am told that this was fixed by r214675 -- this certainly seems plausible to me. John, can you confirm this so that we can close this PR? -- Colin Percival Security Officer, FreeBSD | freebsd.org | The power to serve Founder / author, Tarsnap | tarsnap.com | Online backups for the truly paranoid
On 5/14/11 3:00 PM, Colin Percival wrote: > I am told that this was fixed by r214675 -- this certainly seems > plausible to me. John, can you confirm this so that we can close > this PR? Yes. -- John Baldwin
State Changed From-To: open->closed jhb confirms that this was fixed in r214675 (Nov 2010, MFCed to stable/8 as r217848 in Jan 2011).