Bug 202712

Summary: [cam] [ata] System doesn't recognize older hdd after boot
Product: Base System Reporter: Domagoj Hranjec <cab902>
Component: kernAssignee: freebsd-scsi mailing list <scsi>
Status: Open ---    
Severity: Affects Some People CC: mi, sasamotikomi, scottl, smh
Priority: --- Keywords: patch, regression
Version: 11.2-RELEASE   
Hardware: i386   
OS: Any   
Attachments:
Description Flags
possible patch, not tested
none
dmesg log with disk errors after patch
none
dmesg log with 8.4 livefs disk
none
dmesg log with 8.4 livefs disk #2 none

Description Domagoj Hranjec 2015-08-28 12:40:57 UTC
After upgrade from FreeBSD v8.4 to v9.3, one of my HDDs (/dev/ad3) is no more recognized. It's an old WD 400 MiB hard disk (WD AC2420). During boot, the following error is thrown:

(aprobe0:ata1:0:1:0): SETFEATURES SET TRANSFER MODE. ACB: ef 03 00 00 00 40 00 00 00 00 21 00
(aprobe0:ata1:0:1:0): CAM status: ATA Status Error
(aprobe0:ata1:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
(aprobe0:ata1:0:1:0): RES: 51 04 00 00 00 00 00 00 00 21 00
(aprobe0:ata1:0:1:0): Retrying command
(aprobe0:ata1:0:1:0): SETFEATURES SET TRANSFER MODE. ACB: ef 03 00 00 00 40 00 00 00 00 21 00
(aprobe0:ata1:0:1:0): CAM status: ATA Status Error
(aprobe0:ata1:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
(aprobe0:ata1:0:1:0): RES: 51 04 00 00 00 00 00 00 00 21 00
(aprobe0:ata1:0:1:0): Error 5, Retries exhausted

After that, the device is not added to the dev list. 

It seems that after some changes in kernel in version 9, kernel has problems with starting of the hdd. (Maybe too long spinup time for new code or something?)
Comment 1 Domagoj Hranjec 2015-09-30 12:39:14 UTC
This model probably don't support mode setting, so it should probably be excepted during probing.
Comment 2 Domagoj Hranjec 2015-10-22 15:16:33 UTC
Tested disk on Linux kernel 3.9.6.

[    2.050806] ata2.01: FORCE: horkage modified (noncq)
[    2.050825] ata2.01: ATA-0: WDC AC2420F, 06.16K25, max MWDMA1
[    2.050838] ata2.01: 830760 sectors, multi 16, CHS 989/15/56
[    2.070346] ata2.01: configured for MWDMA1 (device error ignored)
[    2.092510] scsi 1:0:1:0: Direct-Access     ATA      WDC AC2420F      06.1 PQ: 0 ANSI: 5
[    2.095005] sd 1:0:1:0: [sdb] 830760 512-byte logical blocks: (425 MB/405 MiB)
[    2.095337] sd 1:0:1:0: [sdb] Write Protect is off
[    2.095355] sd 1:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[    2.095497] sd 1:0:1:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[    2.098403] sd 1:0:1:0: Attached scsi generic sg2 type 0
[    2.112629]  sdb: sdb1
[    2.112629]  sdb1: <bsd: >
[    2.114854] sd 1:0:1:0: [sdb] Attached SCSI disk

After mount, disk works, reads and writes without any problem. 

Linux ignores the error during configuration and adds the drive, like the 8.4 version of FreeBSD was doing. Obviously, the version 9 fails on error and just drop the disk from configuration.
Comment 3 Domagoj Hranjec 2015-12-17 15:08:56 UTC
The problem is that the new ATA_CAM implementation don't handle the drive in the correct way. The current workaround is to disable the ATA_CAM and activate the old atadisk code in the custom kernel.

--
Example of working MYKERNEL file:

include GENERIC
ident MYKERNEL

nooptions ATA_CAM
device atadisk
device atapicd

--
system message buffer content:

ad3: FAILURE - SETFEATURES SET TRANSFER MODE status=51<READY,DSC,ERROR> error=4<ABORTED>
ad3: 405MB <WDC AC2420F 06.16K25> at ata1-slave WDMA1

--
Disc is functioning normally with this kernel configuration.
Comment 4 sasamotikomi 2016-01-13 14:10:01 UTC
(In reply to cab902 from comment #3)
Fix for new ATA_CAM implementation is here:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199495
Comment 5 Domagoj Hranjec 2017-10-10 11:58:53 UTC
System is now updated to the FreeBSD v10.4 and the problem still isn't solved.

During boot, the following error is thrown:

(aprobe1:ata1:0:1:0): SETFEATURES SET TRANSFER MODE. ACB: ef 03 00 00 00 40 00 0
0 00 00 21 00
(aprobe1:ata1:0:1:0): CAM status: ATA Status Error
(aprobe1:ata1:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
(aprobe1:ata1:0:1:0): RES: 51 04 00 00 00 00 00 00 00 21 00
(aprobe1:ata1:0:1:0): Retrying command
(aprobe1:ata1:0:1:0): SETFEATURES SET TRANSFER MODE. ACB: ef 03 00 00 00 40 00 0
0 00 00 21 00
(aprobe1:ata1:0:1:0): CAM status: ATA Status Error
(aprobe1:ata1:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
(aprobe1:ata1:0:1:0): RES: 51 04 00 00 00 00 00 00 00 21 00
(aprobe1:ata1:0:1:0): Error 5, Retries exhausted

After that, the device is not added to the dev list.
Comment 6 Domagoj Hranjec 2017-10-17 13:51:10 UTC
Worse yet, atadisk code is removed from source version 10, so currently there is no workaround. CAM implementation must be fixed!
Comment 7 Domagoj Hranjec 2019-03-25 20:03:48 UTC
System is now updated to the FreeBSD v11.2 and the problem still isn't solved.

During boot, the following error is thrown:

(aprobe1:ata1:0:1:0): SETFEATURES SET TRANSFER MODE. ACB: ef 03 00 00 00 40 00 00 00 00 21 00
(aprobe1:ata1:0:1:0): CAM status: ATA Status Error
(aprobe1:ata1:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
(aprobe1:ata1:0:1:0): RES: 51 04 00 00 00 00 00 00 00 21 00
(aprobe1:ata1:0:1:0): Retrying command
(aprobe1:ata1:0:1:0): SETFEATURES SET TRANSFER MODE. ACB: ef 03 00 00 00 40 00 00 00 00 21 00
(aprobe1:ata1:0:1:0): CAM status: ATA Status Error
(aprobe1:ata1:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
(aprobe1:ata1:0:1:0): RES: 51 04 00 00 00 00 00 00 00 21 00
(aprobe1:ata1:0:1:0): Error 5, Retries exhausted

After that, the device is not added to the dev list.
Comment 8 Andriy Gapon freebsd_committer 2019-04-09 11:02:51 UTC
In Linux, ata_dev_set_mode() function has several conditions for ignoring errors from ata_dev_set_xfermode(). Examples:
- Don't fail an MWDMA0 set IFF the device indicates it is in MWDMA0
- if the device is actually configured correctly, ignore dev err
Maybe we'd want such a quirk in FreeBSD as well.
Its place would be probedone() function in ata_xpt.c.
Comment 9 Andriy Gapon freebsd_committer 2019-04-09 12:27:45 UTC
Created attachment 203540 [details]
possible patch, not tested
Comment 10 Mikhail Teterin freebsd_committer 2019-04-13 13:37:29 UTC
(In reply to Andriy Gapon from comment #9)
Andriy, maybe, the "right" mode can be set via a hint -- allowing to use the unpatched generic kernel? What would the right knob and the safe value in the OP's example? hint.ata.1.mode= ????
Comment 11 Andriy Gapon freebsd_committer 2019-04-14 09:00:16 UTC
(In reply to Mikhail Teterin from comment #10)
The code already chooses the right mode.
The problem is that the disk fails SETFEATURES SET TRANSFER MODE command even for the right mode.
Comment 12 Domagoj Hranjec 2019-04-17 17:00:51 UTC
Created attachment 203744 [details]
dmesg log with disk errors after patch

After applying the patch, the result is following:

disk is now recognized and /dev/ada1 device is created, however, after further errors, disk is not correctly added and it cannot be accessed. dmesg log is attached.
Comment 13 Andriy Gapon freebsd_committer 2019-04-18 11:14:31 UTC
(In reply to Domagoj Hranjec from comment #12)
Is the disk size determined correctly? It seems that the errors are from reading sectors at the end of reported disk size.
Maybe forcing a PIO mode might help?
Perhaps, the disk has finally died?
Comment 14 Domagoj Hranjec 2019-04-18 14:29:44 UTC
Disk size is correctly deduced. Disk worked perfectly with the old atadisk driver.
Comment 15 Andriy Gapon freebsd_committer 2019-04-18 20:03:50 UTC
(In reply to Domagoj Hranjec from comment #14)
I guess it was a while ago.
Another thing I would try is completely disabling geom tasting (if the system would still come up after that) or all unneeded geom label classes (see kern.geom.label) and see if the disk would stay around.  Then you can try reading from various parts of it.  Those old disks usually start failing from the end.
Comment 16 Mikhail Teterin freebsd_committer 2019-04-18 20:46:13 UTC
(In reply to Andriy Gapon from comment #15)
BTW, Andriy, I suspect, the problem I describe in Bug #237261 is related... The endless loop is the same -- although the failing operation is different.

In my case, the drive is not the original laptop HD, but a replacement ATA SSD. I'm quite certain, there is nothing wrong with the hardware -- and it, too, works fine with FreeBSD-8.x
Comment 17 Andriy Gapon freebsd_committer 2019-04-19 05:59:36 UTC
(In reply to Mikhail Teterin from comment #16)
The commands are too different. ATAPI_IDENTIFY is a special command to request disk description, parameters, etc. READ_DMA is a normal read data from media command.
Comment 18 Domagoj Hranjec 2019-04-21 12:32:54 UTC
Created attachment 203861 [details]
dmesg log with 8.4 livefs disk

I've booted the FreeBSD 8.4 livefs disk.

Disk is correctly configured:
ad3: FAILURE - SETFEATURES SET TRANSFER MODE status=51<READY,DSC,ERROR> error=4<ABORTED>
ad3: 405MB <WDC AC2420F 06.16K25> at ata1-slave WDMA1

I've succesfuly mounted the partition:
Filesystem  1K-blocks   Used Avail Capacity  Mounted on
/dev/md0         4175   3169  1006    76%    /
devfs               1      1     0   100%    /dev
/dev/acd0      276142 276142     0   100%    /dist
/dev/ad3s1d    402150 367694  2284    99%    /mnt

I've succesfuly read all the files on the partition:
find /mnt -type f -exec cat '{}' > /dev/zero ';'

There was no READ_DMA errors, nor errors of any kind.
Comment 19 Andriy Gapon freebsd_committer 2019-04-21 21:31:02 UTC
(In reply to Domagoj Hranjec from comment #18)
You can use dd to read the whole disk or ranges of it.
Comment 20 Domagoj Hranjec 2019-04-22 18:31:00 UTC
The point is that the disk and the data on it is good and functional. The issue seems that ATA_CAM implementation does not address the data on the disk in the correct way.
Comment 21 Andriy Gapon freebsd_committer 2019-04-22 20:47:02 UTC
(In reply to Domagoj Hranjec from comment #20)
And my point is that I am not sure if you have actually tested reading from the end of the disk with the older FreeBSD.  I am not sure that the older FreeBSD reads sectors at the end for tasting and that any of the files you accessed have blocks sufficiently close to the end.
But I do not insist that my hypothesis is correct.
Bruce has suggested another one, for instance.
Comment 22 Domagoj Hranjec 2019-04-23 17:37:45 UTC
Created attachment 203938 [details]
dmesg log with 8.4 livefs disk #2

Fixit# dd if=/dev/ad3 of=/dev/zero
830760+0 records in
830760+0 records out
425349120 bytes transferred in 565.629644 secs (751992 bytes/sec)

I've now read the whole ad3 disk. The read was successful and no errors of any kind were noticed.
Comment 23 Steven Hartland freebsd_committer 2019-04-23 22:45:30 UTC
As a workaround have you tried setting the tunable:
hw.ata.ata_dma=0

This should force PIO mode, that disk however should support mode Multi DMA mode 1 according to the specs I found.
Comment 24 Steven Hartland freebsd_committer 2019-04-23 23:52:16 UTC
Another sysctl you can try to see if you can get any further is:
kern.geom.notaste=1

This should prevent geom from tasting the disk which is more than likely what is responsible for the read requests your seeing fail.

If that does get further you could try manually reading the first and last sectors from the disk with dd.
Comment 25 Andriy Gapon freebsd_committer 2019-04-24 06:16:09 UTC
(In reply to Domagoj Hranjec from comment #22)
Thank you!
Comment 26 Andriy Gapon freebsd_committer 2019-04-24 06:17:19 UTC
(In reply to Steven Hartland from comment #24)
Yeah, I suggested this in comment #15.
Comment 27 Domagoj Hranjec 2019-04-24 17:46:20 UTC
(In reply to Steven Hartland from comment #23)

Put hw.ata.ata_dma=0 to /etc/sysctl.conf.
Nothing changed.

Also, seems that this parameter does not exist:

root@spitfire:/home/hark # sysctl hw.ata.ada_dma
sysctl: unknown oid 'hw.ata.ada_dma'
Comment 28 Domagoj Hranjec 2019-04-24 17:59:23 UTC
(In reply to Steven Hartland from comment #24)
Added kern.geom.notaste=1 to /etc/sysctl.conf.

Parameter is changed:
root@spitfire:/home/hark # sysctl kern.geom.notaste
kern.geom.notaste: 1

However, same issues:
...
(ada1:ata1:0:1:0): READ_DMA. ACB: c8 00 00 00 00 40 00 00 00 00 04 00
(ada1:ata1:0:1:0): CAM status: ATA Status Error
(ada1:ata1:0:1:0): ATA status: 59 (DRDY SERV DRQ ERR), error: 10 (IDNF )
(ada1:ata1:0:1:0): RES: 59 10 00 00 00 00 00 00 00 04 00
(ada1:ata1:0:1:0): Retrying command
(ada1:ata1:0:1:0): READ_DMA. ACB: c8 00 00 00 00 40 00 00 00 00 04 00
(ada1:ata1:0:1:0): CAM status: ATA Status Error
(ada1:ata1:0:1:0): ATA status: 59 (DRDY SERV DRQ ERR), error: 10 (IDNF )
(ada1:ata1:0:1:0): RES: 59 10 00 00 00 00 00 00 00 04 00
(ada1:ata1:0:1:0): Retrying command
...

Disk cannot be accessed.

root@spitfire:/home/hark # dd if=/dev/ada1 of=/dev/zero
dd: /dev/ada1: Input/output error
0+0 records in
0+0 records out
0 bytes transferred in 0.011820 secs (0 bytes/sec)

root@spitfire:/home/hark # fdisk /dev/ada1 
fdisk: could not detect sector size
Comment 29 Andriy Gapon freebsd_committer 2019-04-24 22:02:27 UTC
(In reply to Domagoj Hranjec from comment #22)
I think that Bruce Evans is right, the older stack seems to use CHS addressing in ATA commands that it sends down to the disk. Linux reports the disk as being "ATA-0" where zero is the major ATA version.
And the new stack is missing this code: http://fxr.watson.org/fxr/source/dev/ata/ata-disk.c?v=FREEBSD-8-STABLE#L442
and any support for CHS addressing.

I am not sure if it is worth resurrecting that old functionality to support very old disks.
Comment 30 Andriy Gapon freebsd_committer 2019-04-24 22:05:08 UTC
(In reply to Domagoj Hranjec from comment #27)
Actually that parameter should have gone to /boot/loader.conf as it is a pure tunable, not a sysctl and without an accompanying sysctl.
But I doubt that it would help. Still worth trying.
Comment 31 Steven Hartland freebsd_committer 2019-04-24 23:36:29 UTC
(In reply to Domagoj Hranjec from comment #28)
The setting would need to be done as a tunable i.e. loader.conf to be active early enough in the boot process to avoid the geom taste.
Comment 32 Domagoj Hranjec 2019-04-27 16:03:29 UTC
(In reply to Andriy Gapon from comment #30)

Put hw.ata.ata_dma=0 to /boot/loader.conf.
Disks are now in PIO mode:

ada0 at ata0 bus 0 scbus0 target 0 lun 0
ada0: <ST340014A 8.01> ATA-6 device
ada0: Serial Number 5JXDHA76
ada0: 16.700MB/s transfers (PIO4, PIO 8192bytes)
ada0: 38166MB (78165360 512 byte sectors)
ada1 at ata1 bus 0 scbus1 target 1 lun 0
ada1: <WDC AC2420F 06.16K25> ATA device
ada1: Serial Number WD-WM2680354231
ada1: 11.100MB/s transfers (PIO3, PIO 8192bytes)
ada1: 405MB (830760 512 byte sectors)

However, same errors appear and ada1 stays non functional:

(ada1:ata1:0:1:0): READ_MUL. ACB: c4 00 00 00 00 40 00 00 00 00 01 00
(ada1:ata1:0:1:0): CAM status: ATA Status Error
(ada1:ata1:0:1:0): ATA status: 59 (DRDY SERV DRQ ERR), error: 10 (IDNF )
(ada1:ata1:0:1:0): RES: 59 10 00 00 00 00 00 00 00 01 00
(ada1:ata1:0:1:0): Retrying command
(ada1:ata1:0:1:0): READ_MUL. ACB: c4 00 00 00 00 40 00 00 00 00 01 00
(ada1:ata1:0:1:0): CAM status: ATA Status Error
(ada1:ata1:0:1:0): ATA status: 59 (DRDY SERV DRQ ERR), error: 10 (IDNF )
(ada1:ata1:0:1:0): RES: 59 10 00 00 00 00 00 00 00 01 00
(ada1:ata1:0:1:0): Error 5, Retries exhausted

root@spitfire:/home/hark # fdisk /dev/ada1
fdisk: could not detect sector size
Comment 33 Domagoj Hranjec 2019-04-27 16:14:08 UTC
(In reply to Steven Hartland from comment #31)

Put kern.geom.notaste=1 to /boot/loader.conf.

Same issues, device stays non funtional:
(ada1:ata1:0:1:0): READ_DMA. ACB: c8 00 00 00 00 40 00 00 00 00 01 00
(ada1:ata1:0:1:0): CAM status: ATA Status Error
(ada1:ata1:0:1:0): ATA status: 59 (DRDY SERV DRQ ERR), error: 10 (IDNF )
(ada1:ata1:0:1:0): RES: 59 10 00 00 00 00 00 00 00 01 00
(ada1:ata1:0:1:0): Retrying command
(ada1:ata1:0:1:0): READ_DMA. ACB: c8 00 00 00 00 40 00 00 00 00 01 00
(ada1:ata1:0:1:0): CAM status: ATA Status Error
(ada1:ata1:0:1:0): ATA status: 59 (DRDY SERV DRQ ERR), error: 10 (IDNF )
(ada1:ata1:0:1:0): RES: 59 10 00 00 00 00 00 00 00 01 00
(ada1:ata1:0:1:0): Error 5, Retries exhausted

root@spitfire:/home/hark # fdisk /dev/ada1
fdisk: could not detect sector size
Comment 34 Domagoj Hranjec 2019-04-27 16:15:45 UTC
(In reply to Andriy Gapon from comment #29)

It seems to me that it is the right way to go. CHS addressing needs to be re-implemented in the ATA_CAM.
Comment 35 Steven Hartland freebsd_committer 2019-04-27 20:20:50 UTC
Does your BIOS have the option to put the disk into LBA mode, if so that may help.

If the disk truly does support LBA then the only maybe to stay with an old version as I don’t think readding CHS support would be worth it.
Comment 36 Steven Hartland freebsd_committer 2019-04-27 20:42:27 UTC
That should have said doesn’t support LBA
Comment 37 Domagoj Hranjec 2019-04-28 22:28:25 UTC
(In reply to Steven Hartland from comment #35)

BIOS supports LBA, however, the disk does not. I've tried to force it through BIOS but it doesn't work. 

Regarding CHS, Linux supports it to this day and FreeBSD supported it for 20 years. It doesn't seem unreasonable to support the old disks with the modern kernel. And judging by the old implementation it doesn't seem like a big feature in terms of lines of code.
Comment 38 Domagoj Hranjec 2019-05-22 21:57:37 UTC
Maybe if someone can clarify.. Is currently the code in sys/dev/ata directory used for ata access or is it only dead remnants of the old implementation and all the ata code is in sys/cam/ata directory?
Comment 39 Scott Long freebsd_committer 2019-05-23 06:34:23 UTC
The code in sys/cam/ata is generic protocol and transport support for all devices.  The code in sys/dev/ata is controller-specific drivers.  In simple terms, adding CHS support would happen in sys/cam/ata.

I have mixed feelings on adding CHS support.  As others have mentioned, it's ancient, and it's nearly impossible for people to test.  It would exist as a poorly tested codepath that would be prone to accidental breakage.  The cost of keeping it working, in terms of equipment procurement and operation, would likely outweigh the benefit.

It looks like Amazon has a PCIe add-in card for ATA/IDE, but I haven't owned a working ATA drive in almost 10 years, and I probably haven't owned a functional CHS-only drive in at least 20 years.   I have no idea where I'd get one, other than to buy batches of them off of Ebay and hope to find some that work.  20+ years is a long time for a hard drive, even in the best of circumstances.  Moisture will invade the platter cavity through the breather hole.  Lubrication will slowly evaporate off of spindle and armature joints and redeposit itself onto the platters and heads.  Capacitors on the circuit board will slowly leak, and copper and aluminum connectors and traces will corrode.  I'm impressed that you have a working 400MiB drive, that's a 25 year old drive at this point.  I'd worry that it would stop working in the near future.

If I were to build a rig to operate a CHS-era IDE drive (or any ATA/IDE drive for that matter), it would be solely to recover and archive the drive data to modern storage.  For that, I'd use software that supported the use-case.  If that means using an older version of FreeBSD, or using Linux, I'd do that.  It's such a niche use case that I'd spend considerably more time resurrecting and testing CHS code than I'd spend actually recovering the data, and that's just not an interesting use of my time.

If there's community interest in supporting CHS long-term in FreeBSD, my recommendation is to create a IDE-CHS specific transport in CAM that lives alongside the ATA/SATA support, but does not rely on it.  This probably means copying sys/cam/ata/ata_xpt.c to sys/cam/ata/ide_xpt.c, removing the SATA-specific logic in it, and adding in the IDE and CHS specific logic.  Nice, clean, and isolated so that it's less likely to be accidentally broken, and people working on SATA aren't likely to trip on it.  This would probably be a week of work at most, assuming that test hardware is available.