FreeBSD wfid78-172 10.1-RELEASE-p23 FreeBSD 10.1-RELEASE-p23 #0: Thu May 14 13:35:13 UTC 2015 root@amd64-builder.pcbsd.org:/usr/obj/usr/src/sys/GENERIC amd64 May 30 11:43:29 wfid78-172 kernel: ahcich3: Timeout on slot 24 port 0 May 30 11:43:29 wfid78-172 kernel: ahcich3: is 00000008 cs 00000000 ss 00000000 rs 01000000 tfd 40 serr 00000000 cmd 00207817 May 30 11:43:29 wfid78-172 kernel: (ada2:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 bc f5 1e 40 00 00 00 00 00 00 May 30 11:43:29 wfid78-172 kernel: (ada2:ahcich3:0:0:0): CAM status: Command timeout May 30 11:43:29 wfid78-172 kernel: (ada2:ahcich3:0:0:0): Retrying command May 30 11:43:29 wfid78-172 kernel: ahcich1: Timeout on slot 15 port 0 May 30 11:43:29 wfid78-172 kernel: ahcich1: is 00000008 cs 00000000 ss 00000000 rs 00008000 tfd 40 serr 00000000 cmd 00206f17 May 30 11:43:29 wfid78-172 kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 02 bd f5 1e 40 00 00 00 00 00 00 May 30 11:43:29 wfid78-172 kernel: (ada1:ahcich1:0:0:0): CAM status: Command timeout May 30 11:43:29 wfid78-172 kernel: (ada1:ahcich1:0:0:0): Retrying command I set hint.ahci.0.msi=0 As long as it works.
Affects me on FreeBSD Prelease 10.2 too. (AMD A10-5750 Lenovo E545 Laptop) Same Error: ahcich1: Timeouton slot 17 port 0 ahcich1: is 00000002 cs 00000000 ss 00000000 rs 00020000 tfd 50 seried 00007117 (aprobe0:ahcich :0 :0 :0): ATAPI_IDENTIFY. ACB: a1 00 00 00 00 40 00 (aprobe:ahcich1 :0 :0 :0): CAM status: command timeout (aprobe:ahcich1 :0 :0 :0): Retrying command
I've got a similar problem on FreeBSD 10.1-RELEASE, but using Intel chipset: ahci0: <Intel Cougar Point AHCI SATA controller> port 0xf070-0xf077,0xf060-0xf063,0xf050-0xf057,0xf040-0xf043,0xf000-0xf01f mem 0xdfa22000-0xdfa227ff irq 19 at device 31.2 on pci0 ahci0: AHCI v1.30 with 6 3Gbps ports, Port Multiplier not supported ahcich0: <AHCI channel> at channel 0 on ahci0 ahcich1: <AHCI channel> at channel 1 on ahci0 ahcich2: <AHCI channel> at channel 2 on ahci0 ahcich3: <AHCI channel> at channel 3 on ahci0 ahcich4: <AHCI channel> at channel 4 on ahci0 ahcich5: <AHCI channel> at channel 5 on ahci0 ahciem0: <AHCI enclosure management bridge> on ahci0 Problem first appeared 28 minutes after the binary upgrade from 9.3-RELEASE (the server ran flawlessly for 4 years before that). At boottime : ahcich5: Timeout on slot 0 port 0 (and so on) During normal operation : ahcich1: Timeout on slot 25 port 0 ahcich1: is 00000000 cs 04000000 ss 06000000 rs 06000000 tfd 40 serr 00000000 cmd 0004d917 (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 28 ee 3c 40 2e 00 00 01 00 00 (ada1:ahcich1:0:0:0): CAM status: Command timeout (ada1:ahcich1:0:0:0): Retrying command ahcich1: stopping AHCI engine failed ahcich0: Timeout on slot 23 port 0 ahcich0: is 00000000 cs 01000000 ss 01800000 rs 01800000 tfd 40 serr 00000000 cmd 0004d717 (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 28 ee 3c 40 2e 00 00 01 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command ahcich0: stopping AHCI engine failed ahcich1: AHCI reset: device not ready after 31000ms (tfd = 00000080) ahcich1: executing CLO failed ahcich0: AHCI reset: device not ready after 31000ms (tfd = 00000080) ahcich0: executing CLO failed ahcich1: Timeout on slot 26 port 0 and the box freezes. Happened twice in one month. I'm going to give hint.ahci.0.msi=0 a try later...
(In reply to patpro from comment #2) apparently hint.ahci.0.msi=0 did not help, so I removed this setting. And about 4 weeks later, my box had a new freeze. Currently it runs 10.1-RELEASE-p19, on a Supermicro X9SCL-F motherboard. I'd be happy to help, let me know if I can provide further info or run some tests.
Same issue for me. No problem on 10.0-RELEASE -- it all began after a freebsd-update to 10.2-RELEASE. It's happening across two ahci interfaces: ahci0: <ASMedia ASM1061 AHCI SATA controller> port 0xd800-0xd807,0xd400-0xd403,0xd000-0xd007,0xc800-0xc803,0xc400-0xc41f mem 0xfdfffc00-0xfdfffdff irq 18 at device 0.0 on pci1 ahci0: AHCI v1.20 with 2 6Gbps ports, Port Multiplier supported ahci1: <AMD SB7x0/SB8x0/SB9x0 AHCI SATA controller> port 0xb000-0xb007,0xa000-0xa003,0x9000-0x9007,0x8000-0x8003,0x7000-0x700f mem 0xfdeffc00-0xfdefffff irq 19 at device 17.0 on pci0 ahci1: AHCI v1.20 with 6 3Gbps ports, Port Multiplier supported ahci1: quirks=0x22000<ATI_PMP_BUG,1MSI> I've tried disabling MSI on the AHCI interfaces via hints in /boot/loader.conf but the problem persists.
(In reply to Chris Platts from comment #4) This hint.ahci.0.msi=0 place to /boot/device.hints But now my system have defaults settings and I comment this: ## ## ahci ## #hint.ahci.0.msi=0 #hint.ahci.1.msi=0 All works normal. > uname -a FreeBSD wfid78-172 10.2-RELEASE-p4 FreeBSD 10.2-RELEASE-p4 #0: Tue Aug 18 15:15:36 UTC 2015 > uptime 10:34 up 18 days, 14:21, 6 users, load averages: 0,42 0,40 0,43 Strange... But... the problem is still there, as you can see. It was not (as far as I could understand) changes in ahci.c.
(In reply to Vladimir Noskov from comment #5) Ok -- I tried moving the hits to /boot/device.hints rather than /boot/loader.conf. Unfortunately the errors persist: Sep 30 13:25:15 trillian kernel: ahcich1: Timeout on slot 6 port 0 Sep 30 13:25:15 trillian kernel: ahcich1: is 00000000 cs 00000040 ss 00000000 rs 00000040 tfd 50 serr 00000000 cmd 0004c617 Sep 30 13:25:15 trillian kernel: (ada1:ahcich1:0:0:0): READ_DMA. ACB: c8 00 00 00 00 40 00 00 00 00 01 00 Sep 30 13:25:15 trillian kernel: (ada1:ahcich1:0:0:0): CAM status: Command timeout Sep 30 13:25:15 trillian kernel: (ada1:ahcich1:0:0:0): Retrying command Sep 30 13:25:36 trillian kernel: ahcich0: Timeout on slot 21 port 0 Sep 30 13:25:36 trillian kernel: ahcich0: is 00000000 cs 00200000 ss 00000000 rs 00200000 tfd 451 serr 00000000 cmd 0004d517 Sep 30 13:26:08 trillian kernel: ahcich0: Timeout on slot 26 port 0 Sep 30 13:26:08 trillian kernel: ahcich0: is 00000000 cs 04000000 ss 00000000 rs 04000000 tfd 451 serr 00000000 cmd 0004da17 Sep 30 13:26:08 trillian kernel: (ada0:ahcich0:0:0:0): SETFEATURES ENABLE WCACHE. ACB: ef 02 00 00 00 40 00 00 00 00 00 00 Sep 30 13:26:08 trillian kernel: (ada0:ahcich0:0:0:0): CAM status: Command timeout Sep 30 13:26:08 trillian kernel: (ada0:ahcich0:0:0:0): Retrying command Sep 30 13:26:29 trillian kernel: ahcich1: Timeout on slot 18 port 0 Sep 30 13:26:29 trillian kernel: ahcich1: is 00000000 cs 00000000 ss 00000000 rs 00040000 tfd 451 serr 00000000 cmd 0004d117 Sep 30 13:26:49 trillian kernel: ahcich1: Timeout on slot 29 port 0 Sep 30 13:26:49 trillian kernel: ahcich1: is 00000000 cs 00000000 ss 00000000 rs 20000000 tfd 451 serr 00000000 cmd 0004dc17 They occur in pairs -- if one channel goes, the other follows next. When the channel with the boot disk on times-out, the system freezes for 10-20 seconds before resuming. This is really hurting :(
(In reply to Chris Platts from comment #6) You place this hint.ahci.0.msi=0 hint.ahci.1.msi=0 ?? You see kernel settings: # kenv | grep ahci
(In reply to Vladimir Noskov from comment #7) Hi, Here's my output: chris@trillian:~ % kenv | grep ahci hint.ahci.0.msi="0" hint.ahci.1.msi="0" Those double-quotes aren't present in device.hints, but all the values as displayed by kenv appear to have them - so that's probably not an issue.
Yes, I have no quotes, but it works without them. Hmm ... As far as I understand, the double quotes are needed for multi tsschkvy parameters. But the right to put "" for all. :) When will appear and then check the connecting on the hdd and SATA and Power cables.
"tsschkvy" - read as "words"
(In reply to Vladimir Noskov from comment #10) No problem :) These are eSATA interfaces. I've replaced the cables and enclosures (and in one instance, the drive itself). Still the problem remains. I'm going to install 10.0-RELEASE tonight on another disk and verify that the problem disappears. That's a shame because I was really enjoying the improved NFS performance on 10.2!
I suspect that this has something to do with the hot-plug drives connections. My suspicions are supported by you. Developers are !!! Pay attention!
(In reply to Vladimir Noskov from comment #12) Ok - I think I've been stupid. I was understanding 'ahcichX' to be where 'X' is the controller ID. But it isn't, is it? It's the channel number - the physical connector. So, ahcich0 and ahcich1 are the two connectors on my ASMedia controller card (ahci0). Nothing to do with the built-in controller (ahci1). Sorry :) Now, that card is an eSATA card and as such, supports hot-plugging. You might well be on to something!
(In reply to Chris Platts from comment #13) Well, don't forget I've got the exact same problem on a native Intel chipset (comment 2) : ahci0@pci0:0:31:2: class=0x010601 card=0x062415d9 chip=0x1c028086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = '6 Series/C200 Series Chipset Family SATA AHCI Controller' class = mass storage subclass = SATA And I would be surprised if biotin's laptop (comment 1) would come with hotplug support. I think it could help to report only verified facts here and keep the speculation on forums, so that people taking over this (very nasty) bug have solid info to work with.
(In reply to patpro from comment #14) Fair point! I'll sort out my system so I have two boot disks available -- one being the installation upgraded from 9.1 to 10.0 and finally to 10.2 which is experiencing the problem. I'll also have another fresh 10.2 available for comparison and as a test environment for any further queries from the developers. Will await any requests for further info from the devs.
(In reply to Chris Platts from comment #15) It's not stupidity are just some do not care. Sometimes. But the fact that there is an error related to the behavior it is hot plug devices, very bad. Conventional Sata drives, if they put in the BIOS settings, can also be hot-plug. Developers !!! Where are you?!! :)
Ok, I found the cause of these timeouts for me. I rebuilt the system from scratch on another disk (connected to the same controller as my present boot disk). All the way through the build I had no timeouts. Then I installed smartmontools... It seems that when smartd queried either of the two disks on the problematic controller, it caused command timeouts. Odd, considering the Option ROM for the ASMedia card specifically mentions SMART support. Anyway, everything's fine now. Will take this up with the smartmontools project to see if there's a resolution.
(In reply to Chris Platts from comment #17) Interesting! I use smartmontools too, but I can see timeouts during boot, so I guess this particular problem is not related to smartmontools.
Also talks about this: https://forums.freebsd.org/threads/kernel-ahcich-timeout-in-slot.51868/
I have this issue on two new HP Microserver Gen8, so the probability of a hardware failure is low. I have solved the issue disabling the drives stand by.
Now... bug repets :( FreeBSD wcsn.wocson.ru 10.2-RELEASE-p11 FreeBSD 10.2-RELEASE-p11 #0: Thu Jan 14 15:48:17 UTC 2016 root@amd64-builder.pcbsd.org:/usr/obj/usr/src/sys/GENERIC amd64 Now after update to -p11. First reboot and get error ahci.
I upgrade to 10.3-RELEASE. Error sata is present :(
I'm fixed this problem to me using this simple patch: Index: sys/dev/ahci/ahci_pci.c =================================================================== --- sys/dev/ahci/ahci_pci.c (revision 326818) +++ sys/dev/ahci/ahci_pci.c (working copy) @@ -59,7 +59,7 @@ {0x43901002, 0x00, "AMD SB7x0/SB8x0/SB9x0", AHCI_Q_ATI_PMP_BUG | AHCI_Q_1MSI}, {0x43911002, 0x00, "AMD SB7x0/SB8x0/SB9x0", - AHCI_Q_ATI_PMP_BUG | AHCI_Q_1MSI}, + AHCI_Q_NOMSI | AHCI_Q_NOPMP | AHCI_Q_ATI_PMP_BUG }, {0x43921002, 0x00, "AMD SB7x0/SB8x0/SB9x0", AHCI_Q_ATI_PMP_BUG | AHCI_Q_1MSI}, {0x43931002, 0x00, "AMD SB7x0/SB8x0/SB9x0",
(In reply to Vladimir Noskov from comment #22) Вы решили проблему? (In reply to Dmitry Afanasiev from comment #23) Подскажите, как этот патч применить? Вылезла точно такая же ерунда непонятно с чего.
Have same issue after reinstalling FreeBSD 12.1 from ext to ZFS. here is DMESG log: ahcich1: Timeout on slot 28 port 0 ahcich1: is 00000000 cs 00000000 ss 10000000 rs 10000000 tfd 50 serr 00000000 cmd 0000c317 (ada0:ahcich1:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 01 00 00 00 40 00 00 00 00 00 00 (ada0:ahcich1:0:0:0): CAM status: Command timeout (ada0:ahcich1:0:0:0): Retrying command, 3 more tries remain ahcich1: Timeout on slot 10 port 0 ahcich1: is 00000000 cs 00000000 ss 00000400 rs 00000400 tfd 40 serr 00000000 cmd 0000ca17 (ada0:ahcich1:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 01 00 00 00 40 00 00 00 00 00 00 (ada0:ahcich1:0:0:0): CAM status: Command timeout (ada0:ahcich1:0:0:0): Retrying command, 2 more tries remain ahcich1: Timeout on slot 17 port 0 ahcich1: is 00000000 cs 00000000 ss 00020000 rs 00020000 tfd 40 serr 00000000 cmd 0000d117 (ada0:ahcich1:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 01 00 00 00 40 00 00 00 00 00 00 (ada0:ahcich1:0:0:0): CAM status: Command timeout (ada0:ahcich1:0:0:0): Retrying command, 1 more tries remain ahcich1: Timeout on slot 24 port 0 ahcich1: is 00000000 cs 00000000 ss 01000000 rs 01000000 tfd 40 serr 00000000 cmd 0000d817 (ada0:ahcich1:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 01 00 00 00 40 00 00 00 00 00 00 (ada0:ahcich1:0:0:0): CAM status: Command timeout (ada0:ahcich1:0:0:0): Retrying command, 0 more tries remain ahcich1: Timeout on slot 31 port 0 ahcich1: is 00000000 cs 00000000 ss 80000000 rs 80000000 tfd 40 serr 00000000 cmd 0000df17 (ada0:ahcich1:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 01 00 00 00 40 00 00 00 00 00 00 (ada0:ahcich1:0:0:0): CAM status: Command timeout (ada0:ahcich1:0:0:0): Error 5, Retries exhausted edit /sys/dev/ahci/ahci_pci.c file do not help at all. Have any solutions? Regards
^Triage: I'm sorry that this PR did not get addressed in a timely fashion. By now, the version that it was created against is long out of support. Please re-open if it is still a problem on a supported version.