After upgrading from FreeBSD 7.2-RELEASE to 8.0-RELEASE there were problems with USB. Attempt to mount flash drive leads to an error in USB EHCI driver. Here are the kernel messages: ehci_interrupt: unrecoverable error, controller halted cmd=0x00010020 EHCI_CMD_ITC_1 EHCI_CMD_ASE sts=0x0000b000 EHCI_STS_ASS EHCI_STS_REC EHCI_STS_HCH ien=0x00000037 frindex=0x0000169e ctrdsegm=0x00000000 periodic=0xfee30000 async=0xfee35600 port 1 status=0x00001000 port 2 status=0x00001000 port 3 status=0x00001005 port 4 status=0x00003400 ehci_dump_isoc: isochronous dump from frame 0x053: ITD(0xfffff800813a3900) at 0xff185900 next=0xff384204 status[0]=0x00000000; <> status[1]=0x00000000; <> status[2]=0x00000000; <> status[3]=0x00000000; <> status[4]=0x00000000; <> status[5]=0x00000000; <> status[6]=0x00000000; <> status[7]=0x00000000; <> bp[0]=0x00000000 addr=0x00; endpt=0x0 bp[1]=0x00000000 dir=out; mpl=0x00 bp[2..6]=0x00000000,0x00000000,0x00000000,0x00000000,0x00000000 bp_hi=0x00000000,0x00000000,0x00000000,0x00000000, 0x00000000,0x00000000,0x00000000 SITD(0xfffff800813b4200) at 0xff384200 next=0xfef85502 portaddr=0x00000000 dir=out addr=0 endpt=0x0 port=0x0 huba=0x0 mask=0x00000000 status=0x00000000 <> len=0x0 back=0x00000001, bp=0x00000000,0x00000000,0x00000000,0x00000000 ehci_interrupt: blocking interrupts 0x10 Full dmesg: http://orel.ru/~bel/usb/dmesg.txt Kernel config: http://orel.ru/~bel/usb/SUNC3D.txt How-To-Repeat: Insert USB Flash drive and try to mount it for read-write.
Hi, My guess for this issue is that the cache invalidate and cache flush instructions are not properly implemented by busdma on your platform. Please check that first. --HPS
Responsible Changed From-To: freebsd-usb->freebsd-sparc64 hps claims that this may be sparc64-specific.
As outlined here it's unlikely that this is a problem of the sparc64 bus_dmamap_sync(9): http://lists.freebsd.org/pipermail/freebsd-sparc64/2009-December/006866.html There are however known problems with usb(4) in this regard, see for example: http://svn.freebsd.org/viewvc/base?view=revision&revision=203080 Marius
I can't reproduce the problem using the exact same hardware (U60 and VIA VT6202). Could you please try again with stable/8 (preferably with r212621/sys/dev/usb/usb_busdma.c 1.13.2.5 in place) whether the problem still persists? There where some changes since 8.0-RELEASE including the addition of a workaround for a bug of exactly that controller which might have fixed this. Marius
The problem still persists on stable/8 with sys/dev/usb/usb_busdma.c 1.13.2.5. Note: my U60 have two CPU. With Best Regards, Andrew Belashov
I've also tested with an MP machine and I doubt that may have any impact on this problem. Could you please try whether the following patches make any difference for you? http://people.freebsd.org/~kan/usb_rspro.diff http://people.freebsd.org/~marius/usb_busdma.c_sparc64_no_hack.diff Please also test how the machine behaves if you leave ehci(4) out of the kernel so uhci(4) is used instead. Marius
Dear all, has there been any progress on this one? I'm seeing basically the same thing with FreeBSD 9, but in my case the problem turns up when I try to mount one of my ZFS filesystems built in a RAIDZ configuration on real hard disks connected over USB. The disks are fine, and the ZFS filesystem was unexported cleanly on a different machine. You can find the kernel messages from dmesg below. I'm willing to test patches, but it might take a while, since my machine (Netra T1 AC200 with a 500 MHz CPU) is not quite that fast when recompiling kernels. Cheers, Manuel This is what I get in dmesg: da10 at umass-sim7 bus 7 scbus12 target 0 lun 0 da10: <WD Ext HDD 1021 2021> Fixed Direct Access SCSI-4 device da10: 40.000MB/s transfers da10: 1907727MB (3907024896 512 byte sectors: 255H 255S/T 60084C) ### start mounting the ZFS filesystem on USB disks here, then wait; ### there is a period of insense I/O and a variable amount of time you ### have to wait before the following messages appear: ehci_interrupt: unrecoverable error, controller halted cmd=0x00010030 EHCI_CMD_ITC_1 EHCI_CMD_ASE EHCI_CMD_PSE sts=0x0000e004 EHCI_STS_ASS EHCI_STS_PSS EHCI_STS_REC EHCI_STS_PCD ien=0x00000037 frindex=0x00002746 ctrdsegm=0x00000000 periodic=0xc38cc000 async=0xc2623300 port 1 status=0x0000180b port 2 status=0x00001000 port 3 status=0x00001000 port 4 status=0x00001000 ehci_dump_isoc: isochronous dump from frame 0x068: ITD(0xfffff80021621c00) at 0xc3c79c00 next=0xc3e78b04 status[0]=0x00000000; <> status[1]=0x00000000; <> status[2]=0x00000000; <> status[3]=0x00000000; <> status[4]=0x00000000; <> status[5]=0x00000000; <> status[6]=0x00000000; <> status[7]=0x00000000; <> bp[0]=0x00000000 addr=0x00; endpt=0x0 bp[1]=0x00000000 dir=out; mpl=0x00 bp[2..6]=0x00000000,0x00000000,0x00000000,0x00000000,0x00000000 bp_hi=0x00000000,0x00000000,0x00000000,0x00000000, 0x00000000,0x00000000,0x00000000 SITD(0xfffff80021634b00) at 0xc3e78b00 next=0xc3a78302 portaddr=0x00000000 dir=out addr=0 endpt=0x0 port=0x0 huba=0x0 mask=0x00000000 status=0x00000000 <> len=0x0 back=0x00000001, bp=0x00000000,0x00000000,0x00000000,0x00000000 ehci_interrupt: blocking interrupts 0x10 ugen4.2: <vendor 0x1a40> at usbus4 (disconnected) uhub5: at uhub4, port 1, addr 2 (disconnected) ugen4.3: <vendor 0x1a40> at usbus4 (disconnected) uhub6: at uhub5, port 1, addr 3 (disconnected) ugen4.11: <Western Digital> at usbus4 (disconnected) umass7: at uhub6, port 3, addr 11 (disconnected) (da10:umass-sim7:7:0:0): lost device - 1 outstanding, 1 refs (da10:umass-sim7:7:0:0): oustanding 0 I get the following line when I run "uname -a" FreeBSD router.hinter.bergen2 9.0-STABLE FreeBSD 9.0-STABLE #0: Wed Mar 14 17:09:45 CET 2012 root@router.hinter.bergen2:/usr/obj/usr/src/sys/GENERIC sparc64 -- Homepage: http://www.hinterbergen.de/mala OpenPGP: 0xA330353E (DSA) or 0xD87D188C (RSA)
Have you given the two patches mentioned earlier in the audit-trail of this PR a try? It's probably a good idea to additionally put "options USB_HOST_ALIGN=64" into the kernel configuration file when testing the one for usb_transfer.c. Marius
Hi Marius, ok, so you did not get feedback on those patches. I'll try to give it a try over the weekend (in between my LHCb analysis work), and I'll let you know what comes out. Manuel -- Homepage: http://www.hinterbergen.de/mala OpenPGP: 0xA330353E (DSA) or 0xD87D188C (RSA)
No; so far I also couldn't reproduce this problem using the on-board EHCI controllers in sun4u machines or the add-on cards I have. What controller is this? If I'm not mistaken, the T1-AC200 don't have an on-board EHCI controller. Marius
Correct. These are "no-name" VIA PCI USB 2.0 controllers. In the past, I've swapped several of these (bought more than one year apart - this is what you got at the time when buying a USB controller in the area where I used to live) between the two Netra machines I administer, and it does not seems to be specific to the machine or a specific controller card (in the sense that these PCI cards run fine and without hiccups on Linux/ppc and Linux/x86). These are the kernel-messages: uhci0: <VIA 83C572 USB controller> port 0xc00200-0xc0021f at device 5.0 on pci2 usbus2: <VIA 83C572 USB controller> on uhci0 uhci1: <VIA 83C572 USB controller> port 0xc00220-0xc0023f at device 5.1 on pci2 usbus3: <VIA 83C572 USB controller> on uhci1 ehci0: <VIA VT6202 USB 2.0 controller> mem 0xa000-0xa0ff at device 5.2 on pci2 ehci0: VIA-quirk applied usbus4: EHCI version 1.0 usbus4: <VIA VT6202 USB 2.0 controller> on ehci0 The relevant part of "pciconf -l" is: uhci0@pci0:2:5:0: class=0x0c0300 card=0x30381106 chip=0x30381106 rev=0x62 hdr=0x00 uhci1@pci0:2:5:1: class=0x0c0300 card=0x30381106 chip=0x30381106 rev=0x62 hdr=0x00 ehci0@pci0:2:5:2: class=0x0c0320 card=0x31041106 chip=0x31041106 rev=0x65 hdr=0x00 (World and kernel build for the latest 9-STABLE is on its way, first a reference kernel without patches so I know what I compare to...) Manuel -- Homepage: http://www.hinterbergen.de/mala OpenPGP: 0xA330353E (DSA) or 0xD87D188C (RSA)
Hi Marius, it seems that the first patch alone (http://people.freebsd.org/~kan/usb_rspro.diff) does not solve the issue. I'm currently compiling a kernel with the second patch on its own (with "options USB_HOST_ALIGN=64" in the kernel options as you suggested), and I'll let you know what comes out. Should I also build a kernel with both patches (assuming the one I'm currently building does not work either)? Manuel -- Homepage: http://www.hinterbergen.de/mala OpenPGP: 0xA330353E (DSA) or 0xD87D188C (RSA)
Well, the individual patches shouldn't make things worse except for the second one causing more memory to be used so I'd suggest to combine them. If in the end things actually work we still can check what changes are needed for that. Looking at the Linux USB code, the FreeBSD one doesn't some to honor some DMA constraints and at least for the alignment it's actually hard to follow what value eventually is used. One thing that stands out is that for EHCI, the boundary is 4096. This is most easily fixed by defining USB_PAGE_SIZE to 4096 in sys/dev/usb/usb_busdma.h. Marius
Ok, the second patch on its own doesn't appear to work either, so I'm trying the combination of patches now. By the way: defining USB_PAGE_SIZE to 4096 in sys/dev/usb/usb_busdma.h is a bad idea - the kernel panics with a backtrace pointing into the mmu-related code. Probably has to do with sparc64 mmu only supporting 8k pages, so I'm not terribly surprised... Ok, I'm waiting for the next make buildkernel to finish, and I'll let you know what comes out. Manuel -- Homepage: http://www.hinterbergen.de/mala OpenPGP: 0xA330353E (DSA) or 0xD87D188C (RSA)
Ok, I also tested a kernel with both patches, and the issue persists. Do you have something else to try? Manuel -- Homepage: http://www.hinterbergen.de/mala OpenPGP: 0xA330353E (DSA) or 0xD87D188C (RSA)
Hi Marius, I did a bit of code reading (/usr/src/sys/dev/usb/controller/ehci.c near line 1494), and I realised that the "unrecoverable error" message should only be triggered if the EHCI status register has the EHCI_STS_HCH bit set - according to the status word dump in my log, it is not set (just after the "unrecoverable error" message). The register dump re-reads the status register from the hardware. Could it be that some controllers have a glitch or something on that particular bit, and we better re-read the status register before we conclude that the controller "really wanted to set that bit"? I can also see that the bit is set in the original bug report. I don't know if that machine is just faster (and the bit has not had the time to clear yet), or if we're talking about two different problems here... (This observation might also indicate that small delay loop has to put in before I re-read the status register - we'll have to see...) I'm building a kernel with that modification, but I'd be interested in a second opinion nevertheless... Cheers, Manuel -- Homepage: http://www.hinterbergen.de/mala OpenPGP: 0xA330353E (DSA) or 0xD87D188C (RSA)
> Could it be that some controllers have > a glitch or something on that particular bit, and we better re-read the > status register before we conclude that the controller "really wanted to > set that bit"? You mean EHCI_STS_HSE? This is expected, ehci_interrupt() clears the pending interrupt status bits before dumping the register content: EOWRITE4(sc, EHCI_USBSTS, status); /* acknowledge */ > I can also see that the bit is set in the original bug report. I don't > know if that machine is just faster (and the bit has not had the time to > clear yet), or if we're talking about two different problems here... Probably, the other controller just sets it again after the bit is cleared. Marius
Okay, could you please give the following patch a try? http://people.freebsd.org/~marius/usb_busdma.diff Marius
Okay, I tried both my idea (which naturally did not work ;) and your patch (without my patch, so I don't screw up the results). Unfortunately, your patch does not seem to work either. From what I can tell from here at work, the machine is stuck in a reboot loop (I guess after trying to access the USB disks), but I'd like to be sure and watch the disk's LEDs for a bit when I get home tonight (to make sure that the reboot loop is really related to USB disk access). Manuel -- Homepage: http://www.hinterbergen.de/mala OpenPGP: 0xA330353E (DSA) or 0xD87D188C (RSA)
Hrm, okay, would be interesting to know what the machine actually does. Looking at the code I found another bug; the VIA-workaround currently doesn't do anything: http://people.freebsd.org/~marius/ehci_pci_fix_via_quirk.diff This might apply for the insane I/O you've reported but I'm unsure whether it makes a difference for the HSE interrupt. Marius
On Wed, 4 Apr 2012 14:59:46 +0200 Marius Strobl <marius@alchemy.franken.de> wrote: > Hrm, okay, would be interesting to know what the machine actually does. > Looking at the code I found another bug; the VIA-workaround currently > doesn't do anything: > http://people.freebsd.org/~marius/ehci_pci_fix_via_quirk.diff > This might apply for the insane I/O you've reported but I'm unsure > whether it makes a difference for the HSE interrupt. > > Marius From the looks of it (with your patch at http://people.freebsd.org/~marius/usb_busdma.diff), the machine starts booting, then tries to mount the filesystems residing on the USB disks, apparently does some I/O (while still processing interrupts), and after less than a minute locks up solid without any indication on the serial console as to what went wrong... I've started another build with your "VIA quirk fix" but without the patch in the last paragraph (the machine locking up is a lot worse than just USB not working after some heavy I/O, so I left it out for now), but since I started the build without being properly awake this morning, I typed "make buildworld" where I wanted to type "make buildkernel", so it's going to take some time. Also, I'll be leaving CERN over easter, so I won't be running tests on that machine from tomorrow morning until Monday evening (I can compile kernels, though). Anyhow, I'll let you know what comes out. Cheers, thanks a lot for your effort, and, of course, a Happy Easter! Manuel -- Homepage: http://www.hinterbergen.de/mala OpenPGP: 0xA330353E (DSA) or 0xD87D188C (RSA)
Hi, the "VIA quirk fix" on its own gives the familiar message in dmesg (unrecoverable error, controller halted), so I'm compiling a kernel which combines this fix with your latest busdma fix to try them both together; as I said in my last e-mail, I'll probably not be testing this until Monday night... Manuel -- Homepage: http://www.hinterbergen.de/mala OpenPGP: 0xA330353E (DSA) or 0xD87D188C (RSA)
On Fri, Apr 06, 2012 at 09:58:42AM +0200, Manuel Tobias Schiller wrote: > On Thu, 5 Apr 2012 18:21:24 +0200 > Manuel Tobias Schiller <mala@hinterbergen.de> wrote: > > > On Wed, 4 Apr 2012 14:59:46 +0200 > > Marius Strobl <marius@alchemy.franken.de> wrote: > > > > > Hrm, okay, would be interesting to know what the machine actually > > > does. Looking at the code I found another bug; the VIA-workaround > > > currently doesn't do anything: > > > http://people.freebsd.org/~marius/ehci_pci_fix_via_quirk.diff > > > This might apply for the insane I/O you've reported but I'm unsure > > > whether it makes a difference for the HSE interrupt. > > > > > > Marius > > > > From the looks of it (with your patch at > > http://people.freebsd.org/~marius/usb_busdma.diff), the machine starts > > booting, then tries to mount the filesystems residing on the USB disks, > > apparently does some I/O (while still processing interrupts), and after > > less than a minute locks up solid without any indication on the serial > > console as to what went wrong... > > > > I've started another build with your "VIA quirk fix" but without the > > patch in the last paragraph (the machine locking up is a lot worse than > > just USB not working after some heavy I/O, so I left it out for now), > > but since I started the build without being properly awake this > > morning, I typed "make buildworld" where I wanted to type "make > > buildkernel", so it's going to take some time. Also, I'll be leaving > > CERN over easter, so I won't be running tests on that machine from > > tomorrow morning until Monday evening (I can compile kernels, though). > > Anyhow, I'll let you know what comes out. > > > > Cheers, thanks a lot for your effort, and, of course, a Happy Easter! > > > > Manuel > > Hi, > > the "VIA quirk fix" on its own gives the familiar message in dmesg > (unrecoverable error, controller halted), so I'm compiling a kernel which Oof, this likely means there's a more basic problem with this device. Have you already tried to re-seat the card in case there's an electrical problem? Please also provide the output of `pciconf -rb ehci0@pci0:2:5:2 0:255' from a booting kernel. FYI, after some digging I've found the following card ehci0@pci0:2:5:2: class=0x0c0320 card=0x31041106 chip=0x31041106 rev=0x6h0 which is a newer revision of your device and works just fine in a T1-200 including with the usb(4) fixes. The publicly available datasheets for the VIA USB controllers are minimal and exclude errata and Linux also doesn't seem to use any additional work arounds, so I'm starting to run out of ideas what could be wrong with your revision. The only remaining thing to give a try I currently can think of is to test whether it chokes on the generic initialization done by the sparc64 PCI code using the attached patch. > combines this fix with your latest busdma fix to try them both together; This combination is unlikely to make a difference. Marius
On Fri, 6 Apr 2012 20:37:26 +0200 Marius Strobl <marius@alchemy.franken.de> wrote: > On Fri, Apr 06, 2012 at 09:58:42AM +0200, Manuel Tobias Schiller wrote: > > On Thu, 5 Apr 2012 18:21:24 +0200 > > Manuel Tobias Schiller <mala@hinterbergen.de> wrote: > > > > > On Wed, 4 Apr 2012 14:59:46 +0200 > > > Marius Strobl <marius@alchemy.franken.de> wrote: > > > > > > > Hrm, okay, would be interesting to know what the machine actually > > > > does. Looking at the code I found another bug; the VIA-workaround > > > > currently doesn't do anything: > > > > http://people.freebsd.org/~marius/ehci_pci_fix_via_quirk.diff > > > > This might apply for the insane I/O you've reported but I'm unsure > > > > whether it makes a difference for the HSE interrupt. > > > > > > > > Marius > > > > > > From the looks of it (with your patch at > > > http://people.freebsd.org/~marius/usb_busdma.diff), the machine > > > starts booting, then tries to mount the filesystems residing on the > > > USB disks, apparently does some I/O (while still processing > > > interrupts), and after less than a minute locks up solid without > > > any indication on the serial console as to what went wrong... > > > > > > I've started another build with your "VIA quirk fix" but without the > > > patch in the last paragraph (the machine locking up is a lot worse > > > than just USB not working after some heavy I/O, so I left it out > > > for now), but since I started the build without being properly > > > awake this morning, I typed "make buildworld" where I wanted to > > > type "make buildkernel", so it's going to take some time. Also, > > > I'll be leaving CERN over easter, so I won't be running tests on > > > that machine from tomorrow morning until Monday evening (I can > > > compile kernels, though). Anyhow, I'll let you know what comes out. > > > > > > Cheers, thanks a lot for your effort, and, of course, a Happy > > > Easter! > > > > > > Manuel > > > > Hi, > > > > the "VIA quirk fix" on its own gives the familiar message in dmesg > > (unrecoverable error, controller halted), so I'm compiling a kernel > > which > > Oof, this likely means there's a more basic problem with this device. > Have you already tried to re-seat the card in case there's an electrical > problem? > Please also provide the output of `pciconf -rb ehci0@pci0:2:5:2 0:255' > from a booting kernel. > FYI, after some digging I've found the following card > ehci0@pci0:2:5:2: class=0x0c0320 card=0x31041106 chip=0x31041106 > rev=0x6h0 which is a newer revision of your device and works just fine > in a T1-200 including with the usb(4) fixes. The publicly available > datasheets for the VIA USB controllers are minimal and exclude errata > and Linux also doesn't seem to use any additional work arounds, so I'm > starting to run out of ideas what could be wrong with your revision. > The only remaining thing to give a try I currently can think of is to > test whether it chokes on the generic initialization done by the > sparc64 PCI code using the attached patch. > > > combines this fix with your latest busdma fix to try them both > > together; > > This combination is unlikely to make a difference. > > Marius > Hi Marius, I've tried your new patch, both on its own and in conjunction with the latest busdma and Via quirk fixes, and I still get the same error message... Here's the output of pciconf you requested: mala@router:~> sudo pciconf -rb ehci0@pci0:2:5:2 0:255 Password: 06 11 04 31 06 00 10 22 65 20 03 0c 00 16 80 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 06 11 04 31 00 00 00 00 80 00 00 00 00 00 00 00 14 03 00 00 00 00 0b 00 00 00 00 00 a0 20 00 29 00 00 ff ff 00 5a 04 80 00 00 00 00 04 0b 88 88 33 00 00 00 20 20 01 00 00 00 00 00 01 00 00 00 00 00 00 c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 0a 7e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00 00 This was taken after the controller stopped, on a kernel with your latest patch, but I'd guess that doesn't matter - the EHCI driver should not be playing with the PCI settings after initialisation... I've also opened the machine, and the PCI card is seated properly. I even removed it and tried an even older VIA EHCI controller and one of the first USB 2.0 controllers by NEC - no luck, the VIA one had trouble recognizing devices, the NEC one did not recognize a single one I plugged in. Is there anything else I can try? Manuel -- Homepage: http://www.hinterbergen.de/mala OpenPGP: 0xA330353E (DSA) or 0xD87D188C (RSA)
On Wed, Apr 11, 2012 at 12:59:54PM +0200, Manuel Tobias Schiller wrote: > On Fri, 6 Apr 2012 20:37:26 +0200 > Marius Strobl <marius@alchemy.franken.de> wrote: > > > On Fri, Apr 06, 2012 at 09:58:42AM +0200, Manuel Tobias Schiller wrote: > > > On Thu, 5 Apr 2012 18:21:24 +0200 > > > Manuel Tobias Schiller <mala@hinterbergen.de> wrote: > > > > > > > On Wed, 4 Apr 2012 14:59:46 +0200 > > > > Marius Strobl <marius@alchemy.franken.de> wrote: > > > > > > > > > Hrm, okay, would be interesting to know what the machine actually > > > > > does. Looking at the code I found another bug; the VIA-workaround > > > > > currently doesn't do anything: > > > > > http://people.freebsd.org/~marius/ehci_pci_fix_via_quirk.diff > > > > > This might apply for the insane I/O you've reported but I'm unsure > > > > > whether it makes a difference for the HSE interrupt. > > > > > > > > > > Marius > > > > > > > > From the looks of it (with your patch at > > > > http://people.freebsd.org/~marius/usb_busdma.diff), the machine > > > > starts booting, then tries to mount the filesystems residing on the > > > > USB disks, apparently does some I/O (while still processing > > > > interrupts), and after less than a minute locks up solid without > > > > any indication on the serial console as to what went wrong... > > > > > > > > I've started another build with your "VIA quirk fix" but without the > > > > patch in the last paragraph (the machine locking up is a lot worse > > > > than just USB not working after some heavy I/O, so I left it out > > > > for now), but since I started the build without being properly > > > > awake this morning, I typed "make buildworld" where I wanted to > > > > type "make buildkernel", so it's going to take some time. Also, > > > > I'll be leaving CERN over easter, so I won't be running tests on > > > > that machine from tomorrow morning until Monday evening (I can > > > > compile kernels, though). Anyhow, I'll let you know what comes out. > > > > > > > > Cheers, thanks a lot for your effort, and, of course, a Happy > > > > Easter! > > > > > > > > Manuel > > > > > > Hi, > > > > > > the "VIA quirk fix" on its own gives the familiar message in dmesg > > > (unrecoverable error, controller halted), so I'm compiling a kernel > > > which > > > > Oof, this likely means there's a more basic problem with this device. > > Have you already tried to re-seat the card in case there's an electrical > > problem? > > Please also provide the output of `pciconf -rb ehci0@pci0:2:5:2 0:255' > > from a booting kernel. > > FYI, after some digging I've found the following card > > ehci0@pci0:2:5:2: class=0x0c0320 card=0x31041106 chip=0x31041106 > > rev=0x6h0 which is a newer revision of your device and works just fine > > in a T1-200 including with the usb(4) fixes. The publicly available > > datasheets for the VIA USB controllers are minimal and exclude errata > > and Linux also doesn't seem to use any additional work arounds, so I'm > > starting to run out of ideas what could be wrong with your revision. > > The only remaining thing to give a try I currently can think of is to > > test whether it chokes on the generic initialization done by the > > sparc64 PCI code using the attached patch. > > > > > combines this fix with your latest busdma fix to try them both > > > together; > > > > This combination is unlikely to make a difference. > > > > Marius > > > > Hi Marius, > > I've tried your new patch, both on its own and in conjunction with the > latest busdma and Via quirk fixes, and I still get the same error > message... > > Here's the output of pciconf you requested: > > mala@router:~> sudo pciconf -rb ehci0@pci0:2:5:2 0:255 > Password: > 06 11 04 31 06 00 10 22 65 20 03 0c 00 16 80 00 > 00 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 00 00 00 00 00 06 11 04 31 > 00 00 00 00 80 00 00 00 00 00 00 00 14 03 00 00 > 00 00 0b 00 00 00 00 00 a0 20 00 29 00 00 ff ff This is rather confusing; the 0x29 in the above line means that the VIA workaround is applied. Didn't you say that with the fix to actually apply it, the kernel panics as soon as attaching the device? Apart from this, the configuration space differs in 3 undocumented bytes from mine. I'm not sure whether it's worth trying whether these make a difference ... > 00 5a 04 80 00 00 00 00 04 0b 88 88 33 00 00 00 > 20 20 01 00 00 00 00 00 01 00 00 00 00 00 00 c0 > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 01 00 0a 7e 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00 00 > > This was taken after the controller stopped, on a kernel with your > latest patch, but I'd guess that doesn't matter - the EHCI driver should > not be playing with the PCI settings after initialisation... > > I've also opened the machine, and the PCI card is seated properly. I even > removed it and tried an even older VIA EHCI controller and one of the > first USB 2.0 controllers by NEC - no luck, the VIA one had trouble > recognizing devices, the NEC one did not recognize a single one I plugged > in. > This also is rather strange. Have you ever used any other type of card in the slot, f.e. an NIC, so you can rule out it's broken somehow? How does using the on-board USB controller work out? Marius
Hi Marius, I'm rather busy with work at the moment, so I'm not working quite as much on troubleshooting this issue right now... (See below for answers to your questions...) On Sun, 15 Apr 2012 14:51:05 +0200 Marius Strobl <marius@alchemy.franken.de> wrote: > [...] > > > > > > > > Hi, > > > > > > > > the "VIA quirk fix" on its own gives the familiar message in dmesg > > > > (unrecoverable error, controller halted), so I'm compiling a > > > > kernel which > > > > > > Oof, this likely means there's a more basic problem with this > > > device. Have you already tried to re-seat the card in case there's > > > an electrical problem? > > > Please also provide the output of `pciconf -rb ehci0@pci0:2:5:2 > > > 0:255' from a booting kernel. > > > FYI, after some digging I've found the following card > > > ehci0@pci0:2:5:2: class=0x0c0320 card=0x31041106 chip=0x31041106 > > > rev=0x6h0 which is a newer revision of your device and works just > > > fine in a T1-200 including with the usb(4) fixes. The publicly > > > available datasheets for the VIA USB controllers are minimal and > > > exclude errata and Linux also doesn't seem to use any additional > > > work arounds, so I'm starting to run out of ideas what could be > > > wrong with your revision. The only remaining thing to give a try I > > > currently can think of is to test whether it chokes on the generic > > > initialization done by the sparc64 PCI code using the attached > > > patch. > > > > > > > combines this fix with your latest busdma fix to try them both > > > > together; > > > > > > This combination is unlikely to make a difference. > > > > > > Marius > > > > > > > Hi Marius, > > > > I've tried your new patch, both on its own and in conjunction with > > the latest busdma and Via quirk fixes, and I still get the same error > > message... > > > > Here's the output of pciconf you requested: > > > > mala@router:~> sudo pciconf -rb ehci0@pci0:2:5:2 0:255 > > Password: > > 06 11 04 31 06 00 10 22 65 20 03 0c 00 16 80 00 > > 00 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 06 11 04 31 > > 00 00 00 00 80 00 00 00 00 00 00 00 14 03 00 00 > > 00 00 0b 00 00 00 00 00 a0 20 00 29 00 00 ff ff > > This is rather confusing; the 0x29 in the above line means that the > VIA workaround is applied. Didn't you say that with the fix to > actually apply it, the kernel panics as soon as attaching the > device? > Apart from this, the configuration space differs in 3 undocumented > bytes from mine. I'm not sure whether it's worth trying whether > these make a difference ... Yes, this was from a kernel with your patch and the VIA workaround applied; the kernel usually stops when I start using these devices heavily (i.e. the automatic checks done during a ZFS mount operation). > > 00 5a 04 80 00 00 00 00 04 0b 88 88 33 00 00 00 > > 20 20 01 00 00 00 00 00 01 00 00 00 00 00 00 c0 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 01 00 0a 7e 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00 00 > > > > This was taken after the controller stopped, on a kernel with your > > latest patch, but I'd guess that doesn't matter - the EHCI driver > > should not be playing with the PCI settings after initialisation... > > > > I've also opened the machine, and the PCI card is seated properly. I > > even removed it and tried an even older VIA EHCI controller and one > > of the first USB 2.0 controllers by NEC - no luck, the VIA one had > > trouble recognizing devices, the NEC one did not recognize a single > > one I plugged in. > > > > This also is rather strange. Have you ever used any other type of > card in the slot, f.e. an NIC, so you can rule out it's broken > somehow? Some four or five years ago, the slot held a quad fast ethernet NIC, and that seemed to work fine... But: a lot can happen during this time, so I ordered a new USB controller to test with, just in case... > How does using the on-board USB controller work out? As far as I know, the on-board controller is USB1.1, so I have not really tried it because it's going to be a no-go option for disks (I'd get similar speed getting data from some server here at CERN over my DSL connection, and I probably wouldn't even have to administer the server myself - if I could get them to host my data ;)... I can give the onboard USB 1.1 controller a try, though... I noticed something else when reconnecting everything to the server: The USB ground seems to have a quite high (voltage) potential with respect to the chassis of the server (and the protective ground of the wall outlet), about 80 Volts. I've tried to locate a single faulty power supply of the hard disks (since the server chassis is at ground levels), but when tested individually, none of them shows this behaviour. It only happens when I connect all eight USB disks to the USB hub which in turn connects to the server. Apparently, this is some collective effect. Obviously, when the USB cable from the hub is plugged into the server, this potential difference is no longer there, and the disks are recognised. I'm not sure what this observation means (except that I'd really prefer linear over switching mode power supplies because of the galvanic separation between primary and secondary sides), but I thought I mention it anyway. Manuel > Marius > > -- Homepage: http://www.hinterbergen.de/mala OpenPGP: 0xA330353E (DSA) or 0xD87D188C (RSA)
What's the status here?
batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
I'm sorry that this PR was never addressed. In the meantime, FreeBSD support for sparc64 was dropped.