Ever since upgrading from 5-STABLE to 6-STABLE, xl0 get watchdog timeouts and the NIC resets:
Sep 12 10:20:52 agra kernel: xl0: watchdog timeout
Sep 12 10:20:52 agra kernel: xl0: link state changed to DOWN
Sep 12 10:20:54 agra kernel: xl0: link state changed to UP
This happens 2-3 times a day.
Fix: xl_txeof() and xl_txeof_90xB() restart the timer if some packets had been sent.
See patch file.
Patch attached with submission follows:
How-To-Repeat: Dell Inspiron 8200 with builtin NIC
xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xec80-0xecff mem 0xf8fffc00-0xf8fffc7f irq 11 at device 0.0 on pci2
Over to maintainer(s).
After updating my firewall box to 7.2-STABLE I started getting the=20
watchdog timeouts with the link state going down and returning back up=20
after couple of seconds on the xl interface. Some seeking from gnats=20
returned this bug report.
I applied the patch succesfully on:
src/sys/pci/if_xl.c,v 18.104.22.168 2008/04/23 21:28:29
and will give it a shot for some days in order to see if it breaks=20
something or if the watchdog timeouts still keep occurring. So far=20
rebooting after fresh kernel seems to be ok.
After running the patch from this PR for some days I still got some
watchdog timeouts. As another approach, I'm trying the driver revision 1.4 from
/src/sys/dev/xl/ (8.x sourcetree) which compiled clean on my system with
sources updated as of today.
My uname -a:
FreeBSD xxx.org 7.2-STABLE FreeBSD 7.2-STABLE #10: Sun Aug 9 14:13:47
EEST 2009 email@example.com:/usr/obj/usr/src/sys/MORIA i386
The reason for trying 1.4 was the commit message:
SVN rev 191345 on 2009-04-21 00:42:11Z by yongari
To make it easy whether xl(4) missed Tx completion interrupt check number
of queued packets in watchdog timeout handler. If there are no queued
packets just print a informational message and return without resetting
controller. Also fix to invoke correct Tx completion handler as 3C905B
needs different handler.
I will send a followup after testing the driver for a while - if it seems
to work, is there any chance of backporting it for 7.x?
Over to expert.
Is it still issue on FreeBSD 8.1-RELEASE or 7.3-RELEASE?
To original submitter:
I'm under the impression that the patch just disables watchdog
timeout detection logic of driver. What we need to know here is
why it triggers watchdog timeouts(e.g. driver bug, silicon bug etc).
On Wed, Sep 22, 2010 at 04:29:20PM -0700, Ping Mai wrote:
> i'm traveling in south america at the moment and do not have easy access.
> just by looking at the code snippet in the PR, and from what i can remember, the
> problem was that the xl0 would reset itself voluntarily frequently.
> the few lines that i've added, sets the watchdog timeout to 5 if any packet had
> been sent.
The watchdog keeps track of the time of the latest packet's
transmission attempt. So when you send more packets while watchdog
time is active, the watchdog time is updated whenever transmission
Watchdog timeout should be set in xl_start, not in reclaiming
routine like xl_txeof because xl_start is the only place that kicks
controller to send queued frame. If you see watchdog timeouts this
means the frame queued in xl_start was not sent within timeout
period so adjusting timeout(except unarming it when there are no
pending frames) in xl_txeof is a bug.
> this does not disable the watchdog but merely restarts the count down.? i
> wouldn't disable that
> watchdog because some xl0 is prone to freeze up.
> on my Dell laptop, this patch did reduce the frequency of the resets, which were
> very annoying
> in that in made the system loose all its connections.
> i believe the real problem and fix lies elsewhere.? i would look at the
> interrupt handling logic
> introduced around that time, and the peculiarity of the xl.
To write a real fix I need to know why and when it happens. Recent
FreeBSD releases include a code that checks whether watchdog
timeout of xl(4) was caused by missing Tx completion interrupts. If
this was the case, xl(4) just shows the informational message but
do not reinitialize controller. So it would be better if you can
test more recent FreeBSD releases(8.1-RELEASE or 7.3-RELEASE) and
let me know how it makes any difference.
Another thing I'd like to know is your "pciconf -lcbv" output to
narrow down exact controller revision. If you can easily trigger
the issue please let me know how you did trigger the issue.
> ----- Original Message ----
> From: "yongari@FreeBSD.org" <yongari@FreeBSD.org>
> To: firstname.lastname@example.org; yongari@FreeBSD.org; yongari@FreeBSD.org
> Sent: Tue, September 21, 2010 3:44:55 PM
> Subject: Re: kern/129352: [xl] [patch] xl0 watchdog timeout
> Synopsis: [xl] [patch] xl0 watchdog timeout
> State-Changed-From-To: open->feedback
> State-Changed-By: yongari
> State-Changed-When: Tue Sep 21 18:43:33 UTC 2010
> Is it still issue on FreeBSD 8.1-RELEASE or 7.3-RELEASE?
> To original submitter:
> I'm under the impression that the patch just disables watchdog
> timeout detection logic of driver. What we need to know here is
> why it triggers watchdog timeouts(e.g. driver bug, silicon bug etc).
On Sat, Sep 25, 2010 at 05:22:00AM -0700, Ping Mai wrote:
> I remeber the same hardware did not have the xl reset problem until i upgraded
> to that
> particular release.? that's why i thought it was related to the interrupt
I also vaguely remember xl(4) watchdog timeout issues in 6.x days.
That's reason why I asked whether you still see the issue on recent
> handling layer.
> at the time i've heard others having this reset problem with the xl and it was
> limited to xl chip.? i knew it was not the correct fix but it did reduce those
> resets by 95%.? i will be down in south america until december or january.? but
> i will certainly take a look whenever access permits.? thanks.
Ok, if you find some spare time in future let me know. Let's fix
For bugs matching the following criteria:
Status: In Progress Changed: (is less than) 2014-06-01
Reset to default assignee and clear in-progress tags.
Mail being skipped