Bug 140326 - [em] em0: watchdog timeout when communicating to windows using 9K MTU
Summary: [em] em0: watchdog timeout when communicating to windows using 9K MTU
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: Eric Joyner
URL:
Keywords: IntelNetworking
Depends on:
Blocks:
 
Reported: 2009-11-06 01:20 UTC by Maxim Sobolev
Modified: 2018-12-04 16:48 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Maxim Sobolev freebsd_committer freebsd_triage 2009-11-06 01:20:01 UTC
My em0 interface repeatedly hangs up with watchdog timeout when communicating to the windows host at MTU 9K.

[sobomax@pioneer ~]$ grep em0 /var/run/dmesg.boot
em0: <Intel(R) PRO/1000 Network Connection 6.9.6> port 0xecc0-0xecdf mem 0xfe6e0000-0xfe6fffff,0xfe6d9000-0xfe6d9fff irq 21 at device 25.0 on pci0
em0: Using MSI interrupt
em0: [FILTER]
em0: Ethernet address: 00:22:19:32:87:2f
[sobomax@pioneer ~]$ uname -a
FreeBSD pioneer.sippysoft.com 7.2-RELEASE-p4 FreeBSD 7.2-RELEASE-p4 #0: Sun Oct  4 03:08:04 PDT 2009     root@pioneer.sippysoft.com:/usr/obj/usr/src/sys/PIONEER  amd64
[sobomax@pioneer ~]$ ifconfig em0
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
        options=98<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
        ether 00:22:19:32:87:2f
        inet 192.168.1.1 netmask 0xffffff00 broadcast 192.168.1.255
        inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255
        inet6 fec0::1 prefixlen 64
        media: Ethernet autoselect (1000baseTX <full-duplex>)
        status: active
[sobomax@pioneer ~]$ dmesg | grep watchd
em0: watchdog timeout -- resetting
em0: watchdog timeout -- resetting
em0: watchdog timeout -- resetting
em0: watchdog timeout -- resetting
em0: watchdog timeout -- resetting

I have managed to make a packet capture right at the time when hang happens. It appears to be that either "MAC Pause" or "TCP Segment of reassembled PDU" is the last packet that goes through before the interface hangs.

Here is the screenshot, if somebody wants to take closer look at the actual packets please let me know.

http://sobomax.sippysoft.com/~sobomax/ScreenShot527.png

Turning off TSO and TXCSUM/RXCSUM has not helped. Bringing MTU down to 1,500 resolved the issue.

I have had the same problem happening several times in the past (although I initially attributed it to the bad cable or something like that), so it's definitely not on-off issue.

Given popularity of intel/pro chips in today's computers it look like quite serious issue to me. Any help is greatly appreciated.
Comment 1 jfvogel 2009-11-06 01:28:50 UTC
Can't do much unless you adequately identify hardware, on BOTH sides,
believe
it or not "windows" is not a sufficient description :)

I need to know what the E1000 hardware is, using pciconf -l, and I also need
to
know what is on the Windows side before having a clue on how to repro or
help
you.

Cheers,

Jack


On Thu, Nov 5, 2009 at 5:18 PM, Maksym Sobolyev <sobomax@freebsd.org> wrote:

>
> >Number:         140326
> >Category:       kern
> >Synopsis:       em0: watchdog timeout when communicating to windows using
> 9K MTU
> >Confidential:   no
> >Severity:       serious
> >Priority:       high
> >Responsible:    freebsd-bugs
> >State:          open
> >Quarter:
> >Keywords:
> >Date-Required:
> >Class:          sw-bug
> >Submitter-Id:   current-users
> >Arrival-Date:   Fri Nov 06 01:20:01 UTC 2009
> >Closed-Date:
> >Last-Modified:
> >Originator:     Maksym Sobolyev
> >Release:        7.2-p4
> >Organization:
> Sippy Software, Inc.
> >Environment:
> FreeBSD pioneer.sippysoft.com 7.2-RELEASE-p4 FreeBSD 7.2-RELEASE-p4 #0:
> Sun Oct  4 03:08:04 PDT 2009     root@pioneer.sippysoft.com:/usr/obj/usr/src/sys/PIONEER
>  amd64
> >Description:
> My em0 interface repeatedly hangs up with watchdog timeout when
> communicating to the windows host at MTU 9K.
>
> [sobomax@pioneer ~]$ grep em0 /var/run/dmesg.boot
> em0: <Intel(R) PRO/1000 Network Connection 6.9.6> port 0xecc0-0xecdf mem
> 0xfe6e0000-0xfe6fffff,0xfe6d9000-0xfe6d9fff irq 21 at device 25.0 on pci0
> em0: Using MSI interrupt
> em0: [FILTER]
> em0: Ethernet address: 00:22:19:32:87:2f
> [sobomax@pioneer ~]$ uname -a
> FreeBSD pioneer.sippysoft.com 7.2-RELEASE-p4 FreeBSD 7.2-RELEASE-p4 #0:
> Sun Oct  4 03:08:04 PDT 2009     root@pioneer.sippysoft.com:/usr/obj/usr/src/sys/PIONEER
>  amd64
> [sobomax@pioneer ~]$ ifconfig em0
> em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
>        options=98<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
>        ether 00:22:19:32:87:2f
>        inet 192.168.1.1 netmask 0xffffff00 broadcast 192.168.1.255
>        inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255
>        inet6 fec0::1 prefixlen 64
>        media: Ethernet autoselect (1000baseTX <full-duplex>)
>        status: active
> [sobomax@pioneer ~]$ dmesg | grep watchd
> em0: watchdog timeout -- resetting
> em0: watchdog timeout -- resetting
> em0: watchdog timeout -- resetting
> em0: watchdog timeout -- resetting
> em0: watchdog timeout -- resetting
>
> I have managed to make a packet capture right at the time when hang
> happens. It appears to be that either "MAC Pause" or "TCP Segment of
> reassembled PDU" is the last packet that goes through before the interface
> hangs.
>
> Here is the screenshot, if somebody wants to take closer look at the actual
> packets please let me know.
>
> http://sobomax.sippysoft.com/~sobomax/ScreenShot527.png<http://sobomax.sippysoft.com/%7Esobomax/ScreenShot527.png>
>
> Turning off TSO and TXCSUM/RXCSUM has not helped. Bringing MTU down to
> 1,500 resolved the issue.
>
> I have had the same problem happening several times in the past (although I
> initially attributed it to the bad cable or something like that), so it's
> definitely not on-off issue.
>
> Given popularity of intel/pro chips in today's computers it look like quite
> serious issue to me. Any help is greatly appreciated.
> >How-To-Repeat:
>
> >Fix:
>
>
> >Release-Note:
> >Audit-Trail:
> >Unformatted:
> _______________________________________________
> freebsd-bugs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-bugs
> To unsubscribe, send any mail to "freebsd-bugs-unsubscribe@freebsd.org"
>
Comment 2 Maxim Sobolev freebsd_committer freebsd_triage 2009-11-06 02:28:13 UTC
Jack Vogel wrote:
> Can't do much unless you adequately identify hardware, on BOTH sides, 
> believe
> it or not "windows" is not a sufficient description :)
> 
> I need to know what the E1000 hardware is, using pciconf -l, and I also 
> need to
> know what is on the Windows side before having a clue on how to repro or 
> help
> you.

Jack,

Thank you for the amazingly fast reply.

Sure, FreeBSD side is this:

em0@pci0:0:25:0:        class=0x020000 card=0x02761028 chip=0x10de8086 
rev=0x02 hdr=0x00
     vendor     = 'Intel Corporation'
     class      = network
     subclass   = ethernet

On windows side it's Realtek GiGe card. The system itself is Windows 7 
Ultimate 64-bit edition:

PCI\VEN_10EC&DEV_8168&SUBSYS_02C01028&REV_03

Please let me know if any other information is necessary.

Regards,
-- 
Maksym Sobolyev
Sippy Software, Inc.
Internet Telephony (VoIP) Experts
T/F: +1-646-651-1110
Web: http://www.sippysoft.com
MSN: sales@sippysoft.com
Skype: SippySoft
Comment 3 Mark Linimon freebsd_committer freebsd_triage 2009-11-06 04:35:06 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-net

Over to maintainer(s).
Comment 4 jfvogel 2009-11-06 05:05:11 UTC
Good, that's a start. Now, is there a switch of some sort involved or are
you going
back to back? Some switches have problems with jumbo frames, there are also
some vendors (including our's) interfaces that do not support jumbo frames,
so
you need to check on that also (I mean the RT).

I will check on the Intel adapter tomorrow.

Jack


On Thu, Nov 5, 2009 at 6:28 PM, Maxim Sobolev <sobomax@freebsd.org> wrote:

> Jack Vogel wrote:
>
>> Can't do much unless you adequately identify hardware, on BOTH sides,
>> believe
>> it or not "windows" is not a sufficient description :)
>>
>> I need to know what the E1000 hardware is, using pciconf -l, and I also
>> need to
>> know what is on the Windows side before having a clue on how to repro or
>> help
>> you.
>>
>
> Jack,
>
> Thank you for the amazingly fast reply.
>
> Sure, FreeBSD side is this:
>
> em0@pci0:0:25:0:        class=0x020000 card=0x02761028 chip=0x10de8086
> rev=0x02 hdr=0x00
>    vendor     = 'Intel Corporation'
>    class      = network
>    subclass   = ethernet
>
> On windows side it's Realtek GiGe card. The system itself is Windows 7
> Ultimate 64-bit edition:
>
> PCI\VEN_10EC&DEV_8168&SUBSYS_02C01028&REV_03
>
> Please let me know if any other information is necessary.
>
> Regards,
> --
> Maksym Sobolyev
> Sippy Software, Inc.
> Internet Telephony (VoIP) Experts
> T/F: +1-646-651-1110
> Web: http://www.sippysoft.com
> MSN: sales@sippysoft.com
> Skype: SippySoft
>
Comment 5 Maxim Sobolev freebsd_committer freebsd_triage 2009-11-06 08:30:33 UTC
Jack Vogel wrote:
> Good, that's a start. Now, is there a switch of some sort involved or 
> are you going
> back to back? Some switches have problems with jumbo frames, there are also
> some vendors (including our's) interfaces that do not support jumbo 
> frames, so
> you need to check on that also (I mean the RT).
> 
> I will check on the Intel adapter tomorrow.

Yes, there is switch involved (Cisco/Linksys EG008W ver.3), but I don't 
think it's related. The problem has really escalated when I installed 
Windows 7 on this machine yesterday. Before that the same machine with 
Realtek was running Vista and this problem had happened to me only once 
or twice in two weeks with the same MTU on both ends. And from the 
capture it seems like the very specific condition causes this. 
Unfortunately this box is a gateway for a network, so that I cannot 
replace hub and try to reproduce the issue.

-Maxim
Comment 6 Maxim Sobolev freebsd_committer freebsd_triage 2009-11-06 09:58:52 UTC
Jack,

Here is some additional info you might find useful: I have replaced 
Linksys switch with more "professional" rack-mountable 3Com Baseline 
2816 switch and reproduced the issue just as easy by copying large file 
via SMB from FReeBSD to Windows 7. To me it pretty much rules out any 
problems with the switch.

Hope it helps.

-Maxim
Comment 7 Andre Oppermann freebsd_committer freebsd_triage 2010-08-23 18:45:09 UTC
Responsible Changed
From-To: freebsd-net->jfv

Over to maintainer.
Comment 8 Mark Linimon freebsd_committer freebsd_triage 2015-11-12 07:44:18 UTC
Reassign to erj@ for triage.  To submitter: is this issue still relevant?
Comment 9 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:44:53 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 10 khushi 2018-10-25 04:45:25 UTC
MARKED AS SPAM
Comment 11 Maxim Sobolev freebsd_committer freebsd_triage 2018-12-04 16:48:01 UTC
Close for now.