Ever since upgrading from 5-STABLE to 6-STABLE, xl0 get watchdog timeouts and the NIC resets: Sep 12 10:20:52 agra kernel: xl0: watchdog timeout Sep 12 10:20:52 agra kernel: xl0: link state changed to DOWN Sep 12 10:20:54 agra kernel: xl0: link state changed to UP This happens 2-3 times a day. Fix: xl_txeof() and xl_txeof_90xB() restart the timer if some packets had been sent. See patch file. Patch attached with submission follows: How-To-Repeat: Dell Inspiron 8200 with builtin NIC xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xec80-0xecff mem 0xf8fffc00-0xf8fffc7f irq 11 at device 0.0 on pci2 Running 6-STABLE
Responsible Changed From-To: freebsd-bugs->freebsd-net Over to maintainer(s).
After updating my firewall box to 7.2-STABLE I started getting the=20 watchdog timeouts with the link state going down and returning back up=20 after couple of seconds on the xl interface. Some seeking from gnats=20 returned this bug report. I applied the patch succesfully on: src/sys/pci/if_xl.c,v 1.210.2.2 2008/04/23 21:28:29 and will give it a shot for some days in order to see if it breaks=20 something or if the watchdog timeouts still keep occurring. So far=20 rebooting after fresh kernel seems to be ok. =20
After running the patch from this PR for some days I still got some watchdog timeouts. As another approach, I'm trying the driver revision 1.4 from /src/sys/dev/xl/ (8.x sourcetree) which compiled clean on my system with sources updated as of today. My uname -a: FreeBSD xxx.org 7.2-STABLE FreeBSD 7.2-STABLE #10: Sun Aug 9 14:13:47 EEST 2009 root@xxx.org:/usr/obj/usr/src/sys/MORIA i386 The reason for trying 1.4 was the commit message: SVN rev 191345 on 2009-04-21 00:42:11Z by yongari To make it easy whether xl(4) missed Tx completion interrupt check number of queued packets in watchdog timeout handler. If there are no queued packets just print a informational message and return without resetting controller. Also fix to invoke correct Tx completion handler as 3C905B needs different handler. I will send a followup after testing the driver for a while - if it seems to work, is there any chance of backporting it for 7.x? -Reko
Responsible Changed From-To: freebsd-net->yongari Over to expert.
State Changed From-To: open->feedback Is it still issue on FreeBSD 8.1-RELEASE or 7.3-RELEASE? To original submitter: I'm under the impression that the patch just disables watchdog timeout detection logic of driver. What we need to know here is why it triggers watchdog timeouts(e.g. driver bug, silicon bug etc).
On Wed, Sep 22, 2010 at 04:29:20PM -0700, Ping Mai wrote: > i'm traveling in south america at the moment and do not have easy access. > just by looking at the code snippet in the PR, and from what i can remember, the > problem was that the xl0 would reset itself voluntarily frequently. > > the few lines that i've added, sets the watchdog timeout to 5 if any packet had > been sent. The watchdog keeps track of the time of the latest packet's transmission attempt. So when you send more packets while watchdog time is active, the watchdog time is updated whenever transmission is attempted. Watchdog timeout should be set in xl_start, not in reclaiming routine like xl_txeof because xl_start is the only place that kicks controller to send queued frame. If you see watchdog timeouts this means the frame queued in xl_start was not sent within timeout period so adjusting timeout(except unarming it when there are no pending frames) in xl_txeof is a bug. > this does not disable the watchdog but merely restarts the count down.? i > wouldn't disable that > watchdog because some xl0 is prone to freeze up. > > on my Dell laptop, this patch did reduce the frequency of the resets, which were > very annoying > in that in made the system loose all its connections. > > i believe the real problem and fix lies elsewhere.? i would look at the > interrupt handling logic > introduced around that time, and the peculiarity of the xl. > To write a real fix I need to know why and when it happens. Recent FreeBSD releases include a code that checks whether watchdog timeout of xl(4) was caused by missing Tx completion interrupts. If this was the case, xl(4) just shows the informational message but do not reinitialize controller. So it would be better if you can test more recent FreeBSD releases(8.1-RELEASE or 7.3-RELEASE) and let me know how it makes any difference. Another thing I'd like to know is your "pciconf -lcbv" output to narrow down exact controller revision. If you can easily trigger the issue please let me know how you did trigger the issue. Thanks. > > > ----- Original Message ---- > From: "yongari@FreeBSD.org" <yongari@FreeBSD.org> > To: pingmai@yahoo.com; yongari@FreeBSD.org; yongari@FreeBSD.org > Sent: Tue, September 21, 2010 3:44:55 PM > Subject: Re: kern/129352: [xl] [patch] xl0 watchdog timeout > > Synopsis: [xl] [patch] xl0 watchdog timeout > > State-Changed-From-To: open->feedback > State-Changed-By: yongari > State-Changed-When: Tue Sep 21 18:43:33 UTC 2010 > State-Changed-Why: > Is it still issue on FreeBSD 8.1-RELEASE or 7.3-RELEASE? > To original submitter: > I'm under the impression that the patch just disables watchdog > timeout detection logic of driver. What we need to know here is > why it triggers watchdog timeouts(e.g. driver bug, silicon bug etc). > > http://www.freebsd.org/cgi/query-pr.cgi?pr=129352 > > > >
On Sat, Sep 25, 2010 at 05:22:00AM -0700, Ping Mai wrote: > I remeber the same hardware did not have the xl reset problem until i upgraded > to that > particular release.? that's why i thought it was related to the interrupt I also vaguely remember xl(4) watchdog timeout issues in 6.x days. That's reason why I asked whether you still see the issue on recent FreeBSD releases. > handling layer. > at the time i've heard others having this reset problem with the xl and it was > not > limited to xl chip.? i knew it was not the correct fix but it did reduce those > annoying > resets by 95%.? i will be down in south america until december or january.? but > i will certainly take a look whenever access permits.? thanks. > Ok, if you find some spare time in future let me know. Let's fix it.
State Changed From-To: feedback->open Feddback received.
For bugs matching the following criteria: Status: In Progress Changed: (is less than) 2014-06-01 Reset to default assignee and clear in-progress tags. Mail being skipped
Keyword: patch or patch-ready – in lieu of summary line prefix: [patch] * bulk change for the keyword * summary lines may be edited manually (not in bulk). Keyword descriptions and search interface: <https://bugs.freebsd.org/bugzilla/describekeywords.cgi>