Bug 129352 - [xl] [patch] xl0 watchdog timeout
Summary: [xl] [patch] xl0 watchdog timeout
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-12-01 22:30 UTC by Ping Mai
Modified: 2018-01-03 05:16 UTC (History)
0 users

See Also:


Attachments
file.diff (1.18 KB, patch)
2008-12-01 22:30 UTC, Ping Mai
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ping Mai 2008-12-01 22:30:00 UTC
Ever since upgrading from 5-STABLE to 6-STABLE, xl0 get watchdog timeouts and the NIC resets:
Sep 12 10:20:52 agra kernel: xl0: watchdog timeout
Sep 12 10:20:52 agra kernel: xl0: link state changed to DOWN
Sep 12 10:20:54 agra kernel: xl0: link state changed to UP
This happens 2-3 times a day.

Fix: xl_txeof() and xl_txeof_90xB() restart the timer if some packets had been sent.
See patch file.


Patch attached with submission follows:
How-To-Repeat: Dell Inspiron 8200 with builtin NIC
xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xec80-0xecff mem 0xf8fffc00-0xf8fffc7f irq 11 at device 0.0 on pci2

Running 6-STABLE
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2008-12-01 22:53:23 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-net

Over to maintainer(s).
Comment 2 reko.turja 2009-08-04 11:10:07 UTC
After updating my firewall box to 7.2-STABLE I started getting the=20
watchdog timeouts with the link state going down and returning back up=20
after couple of seconds on the xl interface. Some seeking from gnats=20
returned this bug report.

I applied the patch succesfully on:

src/sys/pci/if_xl.c,v 1.210.2.2 2008/04/23 21:28:29

and will give it a shot for some days in order to see if it breaks=20
something or if the watchdog timeouts still keep occurring.  So far=20
rebooting after fresh kernel seems to be ok.
=20
Comment 3 Reko Turja 2009-08-09 12:56:44 UTC
After running the patch from this PR for some days I still got some 
watchdog timeouts. As another approach, I'm trying the driver revision 1.4 from 
/src/sys/dev/xl/ (8.x sourcetree) which compiled clean on my system with 
sources updated as of today.

My uname -a:

FreeBSD xxx.org 7.2-STABLE FreeBSD 7.2-STABLE #10: Sun Aug  9 14:13:47 
EEST 2009     root@xxx.org:/usr/obj/usr/src/sys/MORIA  i386

The reason for trying 1.4 was the commit message:

SVN rev 191345 on 2009-04-21 00:42:11Z by yongari

To make it easy whether xl(4) missed Tx completion interrupt check number 
of queued packets in watchdog timeout handler. If there are no queued 
packets just print a informational message and return without resetting 
controller. Also fix to invoke correct Tx completion handler as 3C905B 
needs different handler.

I will send a followup after testing the driver for a while - if it seems 
to work, is there any chance of backporting it for 7.x?

-Reko
Comment 4 Andre Oppermann freebsd_committer 2010-08-23 18:57:17 UTC
Responsible Changed
From-To: freebsd-net->yongari

Over to expert.
Comment 5 Pyun YongHyeon freebsd_committer 2010-09-21 19:43:33 UTC
State Changed
From-To: open->feedback

Is it still issue on FreeBSD 8.1-RELEASE or 7.3-RELEASE? 
To original submitter: 
I'm under the impression that the patch just disables watchdog 
timeout detection logic of driver. What we need to know here is 
why it triggers watchdog timeouts(e.g. driver bug, silicon bug etc).
Comment 6 pyunyh 2010-09-23 01:26:04 UTC
On Wed, Sep 22, 2010 at 04:29:20PM -0700, Ping Mai wrote:
> i'm traveling in south america at the moment and do not have easy access.
> just by looking at the code snippet in the PR, and from what i can remember, the
> problem was that the xl0 would reset itself voluntarily frequently.
> 
> the few lines that i've added, sets the watchdog timeout to 5 if any packet had 
> been sent.

The watchdog keeps track of the time of the latest packet's
transmission attempt. So when you send more packets while watchdog
time is active, the watchdog time is updated whenever transmission
is attempted.

Watchdog timeout should be set in xl_start, not in reclaiming
routine like xl_txeof because xl_start is the only place that kicks
controller to send queued frame. If you see watchdog timeouts this
means the frame queued in xl_start was not sent within timeout
period so adjusting timeout(except unarming it when there are no
pending frames) in xl_txeof is a bug.

> this does not disable the watchdog but merely restarts the count down.? i 
> wouldn't disable that
> watchdog because some xl0 is prone to freeze up.
> 
> on my Dell laptop, this patch did reduce the frequency of the resets, which were 
> very annoying
> in that in made the system loose all its connections.
> 
> i believe the real problem and fix lies elsewhere.? i would look at the 
> interrupt handling logic
> introduced around that time, and the peculiarity of the xl.
> 

To write a real fix I need to know why and when it happens. Recent
FreeBSD releases include a code that checks whether watchdog
timeout of xl(4) was caused by missing Tx completion interrupts. If
this was the case, xl(4) just shows the informational message but
do not reinitialize controller. So it would be better if you can
test more recent FreeBSD releases(8.1-RELEASE or 7.3-RELEASE) and
let me know how it makes any difference.
Another thing I'd like to know is your "pciconf -lcbv" output to
narrow down exact controller revision. If you can easily trigger
the issue please let me know how you did trigger the issue.

Thanks.

> 
> 
> ----- Original Message ----
> From: "yongari@FreeBSD.org" <yongari@FreeBSD.org>
> To: pingmai@yahoo.com; yongari@FreeBSD.org; yongari@FreeBSD.org
> Sent: Tue, September 21, 2010 3:44:55 PM
> Subject: Re: kern/129352: [xl] [patch] xl0 watchdog timeout
> 
> Synopsis: [xl] [patch] xl0 watchdog timeout
> 
> State-Changed-From-To: open->feedback
> State-Changed-By: yongari
> State-Changed-When: Tue Sep 21 18:43:33 UTC 2010
> State-Changed-Why: 
> Is it still issue on FreeBSD 8.1-RELEASE or 7.3-RELEASE?
> To original submitter:
> I'm under the impression that the patch just disables watchdog
> timeout detection logic of driver. What we need to know here is
> why it triggers watchdog timeouts(e.g. driver bug, silicon bug etc).
> 
> http://www.freebsd.org/cgi/query-pr.cgi?pr=129352
> 
> 
> 
>
Comment 7 pyunyh 2010-09-25 23:45:55 UTC
On Sat, Sep 25, 2010 at 05:22:00AM -0700, Ping Mai wrote:
> I remeber the same hardware did not have the xl reset problem until i upgraded 
> to that
> particular release.? that's why i thought it was related to the interrupt 

I also vaguely remember xl(4) watchdog timeout issues in 6.x days.
That's reason why I asked whether you still see the issue on recent
FreeBSD releases.

> handling layer.
> at the time i've heard others having this reset problem with the xl and it was 
> not
> limited to xl chip.? i knew it was not the correct fix but it did reduce those 
> annoying
> resets by 95%.? i will be down in south america until december or january.? but 
> i will certainly take a look whenever access permits.? thanks.
> 

Ok, if you find some spare time in future let me know. Let's fix
it.
Comment 8 Pyun YongHyeon freebsd_committer 2010-09-25 23:50:42 UTC
State Changed
From-To: feedback->open

Feddback received.
Comment 9 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 08:00:42 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped