Summary: | [msk] msk driver keeps erroring | ||
---|---|---|---|
Product: | Base System | Reporter: | Jack <xxjack12xx> |
Component: | kern | Assignee: | Pyun YongHyeon <yongari> |
Status: | Open --- | ||
Severity: | Affects Only Me | CC: | alnis.morics, kuzvesov, pi |
Priority: | Normal | ||
Version: | 9.0-RELEASE | ||
Hardware: | Any | ||
OS: | Any |
Description
Jack
2012-04-07 15:30:01 UTC
Responsible Changed From-To: freebsd-bugs->freebsd-net Over to maintainer(s). State Changed From-To: open->feedback Would you try the diff at the following URL? http://svnweb.freebsd.org/base/stable/9/sys/dev/msk/if_msk.c?r1=229524&r2=229874&view=patch Also make sure to cold boot your box after applying the patch. Warm reboot may not address the issue. Responsible Changed From-To: freebsd-net->yongari Grab. Same problem with FreeBSD 8.3 Stable and (output of pciconf -vlc) mskc0@pci0:4:0:0: class=0x020000 card=0x34528086 chip=0x436111ab rev=0x18 hdr=0x00 vendor = 'Marvell Semiconductor (Was: Galileo Technology Ltd)' device = 'Yukon 88E8050 PCI-E ASF Gigabit Ethernet Controller' class = network subclass = ethernet cap 01[48] = powerspec 2 supports D0 D1 D2 D3 current D0 cap 03[50] = VPD cap 05[5c] = MSI supports 2 messages, 64 bit cap 10[e0] = PCI-Express 1 legacy endpoint max data 128(128) link x1(x1) ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected Cold boot helps, but it looks that every boot has to be a cold boot. Therefore I use ifconfig msk0 -rxcsum and the problem is gone away. Further I use hw.msk.msi_disable="1" to get rid of the watchdog timeout problem with link state changed to DOWN/UP. Running the GENERIC kernel I see on every boot the message mskc0: Uncorrectable PCI Express error. With the msk.pcierr.patch given in kern/119613 the message is mskc0: PCI Express error(0x00100000). Running my own kernel (only network devices miibus,fxp,rl,msk) the PCI Error message disappears. -- Andreas Longwitz I have same issue: msk0: watchdog timeout msk0: prefetch unit stuck? msk0: initialization failed: no memory for Rx buffers After adding to /boot/loader.conf: net.inet.tcp.tso=0 hw.pci.enable_msix=0 hw.pci.enable_msi=0 and to rc.conf: ifconfig_msk0="inet x.x.x.x netmask 255.255.255.0 -tso -txcsum -rxcsum -vlanhwtag" left only: msk0: watchdog timeout And "watchdog timeout" occurs not so fast as before, but always after 1-2 mins of high network load (scp, for example). With low load it work already 15 days without "watchdog timeout". If any from FreeBSD's developers team want, I can give root access to this computer for fix this bug. Other info: FreeBSD gnat.xxx.local 9.0-STABLE FreeBSD 9.0-STABLE #0: Mon Jul 2 05:52:45 MSK 2012 root@gnat.xxx.local:/usr/obj/usr/src/sys/GNAT amd64 (RELENG_9 from Jun 29 2012) 88E8053 PCI-E Gigabit Ethernet Controller integrated on motherboard P5GD2-Deluxe. hostb0@pci0:0:0:0: class=0x060000 card=0x25808086 chip=0x25808086 rev=0x04 hdr=0x00 vendor = 'Intel Corporation' device = '82915G/P/GV/GL/PL/910GL Memory Controller Hub' class = bridge subclass = HOST-PCI cap 09[e0] = vendor (length 9) Intel cap 2 version 1 pcib1@pci0:0:1:0: class=0x060400 card=0x00008086 chip=0x25818086 rev=0x04 hdr=0x01 vendor = 'Intel Corporation' device = '82915G/P/GV/GL/PL/910GL PCI Express Root Port' class = bridge subclass = PCI-PCI cap 0d[88] = PCI Bridge card=0x00008086 cap 01[80] = powerspec 2 supports D0 D3 current D0 cap 05[90] = MSI supports 1 message cap 10[a0] = PCI-Express 1 root port max data 128(128) link x16(x16) ecap 0002[100] = VC 1 max VC1 ecap 0005[140] = unknown 1 hdac0@pci0:0:27:0: class=0x040300 card=0x813d1043 chip=0x26688086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801FB/FBM/FR/FW/FRW (ICH6 Family) High Definition Audio Controller' class = multimedia subclass = HDA cap 01[50] = powerspec 2 supports D0 D3 current D0 cap 05[60] = MSI supports 1 message, 64 bit cap 10[70] = PCI-Express 1 root endpoint max data 128(128) link x0(x0) ecap 0002[100] = VC 1 max VC1 ecap 0005[130] = unknown 1 pcib2@pci0:0:28:0: class=0x060400 card=0x00000000 chip=0x26608086 rev=0x03 hdr=0x01 vendor = 'Intel Corporation' device = '82801FB/FBM/FR/FW/FRW (ICH6 Family) PCI Express Port 1' class = bridge subclass = PCI-PCI cap 10[40] = PCI-Express 1 root port max data 128(128) link x0(x1) cap 05[80] = MSI supports 1 message cap 0d[90] = PCI Bridge card=0x00000000 cap 01[a0] = powerspec 2 supports D0 D3 current D0 ecap 0002[100] = VC 1 max VC1 ecap 0005[180] = unknown 1 pcib3@pci0:0:28:1: class=0x060400 card=0x00000000 chip=0x26628086 rev=0x03 hdr=0x01 vendor = 'Intel Corporation' device = '82801FB/FBM/FR/FW/FRW (ICH6 Family) PCI Express Port 2' class = bridge subclass = PCI-PCI cap 10[40] = PCI-Express 1 root port max data 128(128) link x1(x1) cap 05[80] = MSI supports 1 message cap 0d[90] = PCI Bridge card=0x00000000 cap 01[a0] = powerspec 2 supports D0 D3 current D0 ecap 0002[100] = VC 1 max VC1 ecap 0005[180] = unknown 1 uhci0@pci0:0:29:0: class=0x0c0300 card=0x80a61043 chip=0x26588086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801FB/FBM/FR/FW/FRW (ICH6 Family) USB UHCI' class = serial bus subclass = USB uhci1@pci0:0:29:1: class=0x0c0300 card=0x80a61043 chip=0x26598086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801FB/FBM/FR/FW/FRW (ICH6 Family) USB UHCI' class = serial bus subclass = USB uhci2@pci0:0:29:2: class=0x0c0300 card=0x80a61043 chip=0x265a8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801FB/FBM/FR/FW/FRW (ICH6 Family) USB UHCI' class = serial bus subclass = USB uhci3@pci0:0:29:3: class=0x0c0300 card=0x80a61043 chip=0x265b8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801FB/FBM/FR/FW/FRW (ICH6 Family) USB UHCI' class = serial bus subclass = USB ehci0@pci0:0:29:7: class=0x0c0320 card=0x80a61043 chip=0x265c8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801FB/FBM/FR/FW/FRW (ICH6 Family) USB2 EHCI Controller' class = serial bus subclass = USB cap 01[50] = powerspec 2 supports D0 D3 current D0 cap 0a[58] = EHCI Debug Port at offset 0xa0 in map 0x14 pcib4@pci0:0:30:0: class=0x060401 card=0x00000000 chip=0x244e8086 rev=0xd3 hdr=0x01 vendor = 'Intel Corporation' device = '82801 PCI Bridge' class = bridge subclass = PCI-PCI cap 0d[50] = PCI Bridge card=0x00000000 isab0@pci0:0:31:0: class=0x060100 card=0x00000000 chip=0x26408086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801FB/FR (ICH6/ICH6R) LPC Interface Bridge' class = bridge subclass = PCI-ISA atapci1@pci0:0:31:1: class=0x01018a card=0x80a61043 chip=0x266f8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801FB/FBM/FR/FW/FRW (ICH6 Family) IDE Controller' class = mass storage subclass = ATA ahci0@pci0:0:31:2: class=0x010601 card=0x26061043 chip=0x26528086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801FR/FRW (ICH6R/ICH6RW) SATA Controller' class = mass storage subclass = SATA cap 01[70] = powerspec 2 supports D0 D3 current D0 ichsmb0@pci0:0:31:3: class=0x0c0500 card=0x80a61043 chip=0x266a8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801FB/FBM/FR/FW/FRW (ICH6 Family) SMBus Controller' class = serial bus subclass = SMBus vgapci0@pci0:4:0:0: class=0x030000 card=0x005c1043 chip=0x5b601002 rev=0x00 hdr=0x00 vendor = 'ATI Technologies Inc' device = 'RV370 5B60 [Radeon X300 (PCIE)]' class = display subclass = VGA cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0 cap 10[58] = PCI-Express 1 endpoint max data 128(128) link x16(x16) cap 05[80] = MSI supports 1 message, 64 bit ecap 0001[100] = AER 1 0 fatal 0 non-fatal 0 corrected vgapci1@pci0:4:0:1: class=0x038000 card=0x005d1043 chip=0x5b701002 rev=0x00 hdr=0x00 vendor = 'ATI Technologies Inc' device = 'RV370 [Radeon X300SE]' class = display cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0 cap 10[58] = PCI-Express 1 endpoint max data 128(128) link x16(x16) mskc0@pci0:2:0:0: class=0x020000 card=0x81421043 chip=0x436211ab rev=0x15 hdr=0x00 vendor = 'Marvell Technology Group Ltd.' device = '88E8053 PCI-E Gigabit Ethernet Controller' class = network subclass = ethernet cap 01[48] = powerspec 2 supports D0 D1 D2 D3 current D0 cap 03[50] = VPD cap 05[5c] = MSI supports 2 messages, 64 bit cap 10[e0] = PCI-Express 1 legacy endpoint max data 128(128) link x1(x1) ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected none0@pci0:1:0:0: class=0x020000 card=0x138f1043 chip=0x1fa711ab rev=0x07 hdr=0x00 vendor = 'Marvell Technology Group Ltd.' device = '88W8310 and 88W8000G [Libertas] 802.11g client chipset' class = network subclass = ethernet cap 01[40] = powerspec 2 supports D0 D3 current D0 fwohci0@pci0:1:3:0: class=0x0c0010 card=0x808b1043 chip=0x8023104c rev=0x00 hdr=0x00 vendor = 'Texas Instruments' device = 'TSB43AB22A IEEE-1394a-2000 Controller (PHY/Link) [iOHCI-Lynx]' class = serial bus subclass = FireWire cap 01[44] = powerspec 2 supports D0 D1 D2 D3 current D0 atapci0@pci0:1:5:0: class=0x010400 card=0x81361043 chip=0x31141095 rev=0x02 hdr=0x00 vendor = 'Silicon Image, Inc.' device = 'SiI 3114 [SATALink/SATARaid] Serial ATA Controller' class = mass storage subclass = RAID cap 01[60] = powerspec 2 supports D0 D1 D2 D3 current D0 # grep msk /var/run/dmesg.boot mskc0: <Marvell Yukon 88E8053 Gigabit Ethernet> port 0xc800-0xc8ff mem 0xd7efc000-0xd7efffff irq 17 at device 0.0 on pci2 msk0: <Marvell Technology Group Ltd. Yukon EC Id 0xb6 Rev 0x01> on mskc0 msk0: Ethernet address: 00:11:d8:4a:cd:c4 miibus0: <MII bus> on msk0 This issue is still occurring on my 8-STABLE box. The patch suggested in the feedback state change is already present, and the machine still doesn't last very long under any kind of load. Maybe it's just time to get that Marvell part out of my machine? Guess I overestimated the quality of the Marvell support. What more debugging help can I offer? It's the exact same message (and same bit of code), but I don't have the datasheet. -- Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 57.5" / 37N 20' 15.3" Internet: steve @ Watt.COM Whois: SW32-ARIN Free time? There's no such thing. It just comes in varying prices... On Tue, Oct 16, 2012 at 04:10:01PM +0000, Steve Watt wrote: > The following reply was made to PR kern/166727; it has been noted by GNATS. > > From: "Steve Watt" <steve@Watt.COM> > To: <bug-followup@FreeBSD.org>, <xxjack12xx@gmail.com> > Cc: > Subject: Re: kern/166727: [msk] msk driver keeps erroring > Date: Tue, 16 Oct 2012 08:37:27 -0700 > > [...] > This issue is still occurring on my 8-STABLE box. The patch suggested in > the feedback state change is already present, and the machine still doesn't > last very long under any kind of load. Ok. > Maybe it's just time to get that Marvell part out of my machine? Guess I > overestimated the quality of the Marvell support. Marvell didn't ever release publicly available data sheet and seems to have no interests to support FreeBSD at this moment(no data sheet, no engineering sample, no reply for technical questions etc). msk(4) is result of joint effort between open source developers and users. Because it's not rare to see several silicon bugs on specific chip sets, it is driver's responsibility to write a workaround code or disable some offloading features to get stable operation. All these workaround comes from user' feedback and trial and errors. So, without vendor support, it shall take time to get stable driver but I'll try to improve current situation. > What more debugging help can I offer? It's the exact same message (and same > bit of code), but I don't have the datasheet. Unfortunately I have no clue for the issue at this moment. I'll disable RX checksum offloading in near future since it seems it triggers more problems. But watchdog timeouts looks completely different issue to me. The only thing I can think of is cold-boot(remove power cord and wait more than 30 seconds and boot). State Changed From-To: feedback->open Feedback received. State Changed From-To: open->patched I've disabled RX checksum offloading for Yukon 88E8053 controller so I believe it wouldn't trigger the issue again. It was already merged to both stable/9 and stable/8. Steve, are you still seening the issue on latest stable/9? I'm still getting this on FreeBSD 10.0 The same problem here. I had it on 10-RELEASE amd64 and now on 10.1-RC2. uname -a: FreeBSD myhost.mydomain.tld 10.1-RC2 FreeBSD 10.1-RC2 #0 r272876: Fri Oct 10 01:12:21 UTC 2014 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 pciconf -lv: [..] mskc0@pci0:9:0:0: class=0x020000 card=0xc072144d chip=0x435411ab rev=0x00 hdr=0x00 vendor = 'Marvell Technology Group Ltd.' device = '88E8040 PCI-E Fast Ethernet Controller' class = network subclass = ethernet I think the issue mentioned in Bug 166727 has no relation with Alnis's issue and it was fixed long time ago. I know there are a couple of msk(4) instability reports on more recent FreeBSD releases. Given that there was almost no functional msk(4) change for a very long time the issue may be different one. Alnis, I guess it would be better to show more details on your issue rather than referring to "the same program". I think you wouldn't see "prefetch unit stuck?" or "initialization failed: no memory for Rx buffers" message on your hardware. If it shows different issue, file a different report would be better, IMO. Well, there are already several PRs about "watchdog timeout" with msk0, so I thought it wouldn't be a good idea to add one more. At the moment, I have both 9.3-RELEASE and 10.1-RC2 on the same machine. When I boot from 9.3-RELEASE, I don't experience any problems with msk0; for example, I can scp a 200 MB file to it with 9.5 MB/s speed. When I boot from 10.1-RC2 and try to scp a 5 MB file to it: -only ~300-400 packets are sent (e.g., 374, as shown on "slurm -i msk0") -speed is ~500 KB/s -transmission continues for less than a second -within 10 seconds, "msk0: watchdog timeout" message is printed and repeated every 10 seconds on ttyv0 until I reboot the machine. No, I don't have "prefetch unit stuck" and "initialization failed: no memory for Rx buffers". Does it mean my problem is unrelated? Is it solved long ago? As I read on Forums, some people have it solved and reappearing (https://forums.freebsd.org/threads/msk0-watchdog-timeout.10183/). Pyun, did you make any changes? For me, the problem seems to have gone away in RC3 (10.1-RC3 FreeBSD 10.1-RC3 #0 r273437: Tue Oct 21 23:55:15 UTC 2014 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64). Glad to hear your issue has gone on RC3. No, I didn't touch msk(4) at all. I guess some other change in kernel may have fixed it. Sorry to say, for me the issue has reappeared in 10.1-RELEASE. I tried to replace miibus with that from RC3 but the result was exactly the same. see https://lists.freebsd.org/pipermail/freebsd-stable/2015-April/082226.html for a suggested solution. https://lists.freebsd.org/pipermail/freebsd-stable/2015-April/082230.html says: Problem still persists, with: mskc0 at pci0:9:0:0: class=0x020000 card=0xc072144d chip=0x435411ab rev=0x00 hdr=0x00 vendor = 'Marvell Technology Group Ltd.' device = '88E8040 PCI-E Fast Ethernet Controller' class = network subclass = ethernet on 10.1p9-amd64. batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed. The problem is still there in 11.2 release. My configuration: - motherboard gigabyte ga-965p-s3 - onboard nic marvel 8056 gigabit lan - amd64 - 8gb ram Symptoms are all the same : any significant traffic via the nic causes hangs of data transfers and lost connections. |