Bug 184154 - [cam] QUIRK: SYNC_CACHE not supported on IBM ServeRAID 8k
Summary: [cam] QUIRK: SYNC_CACHE not supported on IBM ServeRAID 8k
Status: Closed Feedback Timeout
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-scsi (Nobody)
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2013-11-21 22:20 UTC by Petr Cibulka
Modified: 2019-01-18 05:16 UTC (History)
2 users (show)

See Also:


Attachments
file.diff (572 bytes, patch)
2013-11-21 22:20 UTC, Petr Cibulka
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Petr Cibulka 2013-11-21 22:20:00 UTC
FreeNAS 9.1.1 based on FreeBSD 9.1 shows in dmesg:
da1 at mpt1 bus 0 scbus2 target 0 lun 0
da1: <ServeRA ZFSdisk0 V1.0> Fixed Direct Access SCSI-2 device 
da1: 300.000MB/s transfers
da1: Command Queueing enabled
da1: 204800MB (419430400 512 byte sectors: 255H 63S/T 26108C)

As soon as ZFS filesystem on RDM disks is attempted to mount a flood of error messages for each RDM disk is written to dmesg:
(da1:mpt1:0:0:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 
(da1:mpt1:0:0:0): CAM status: SCSI Status Error
(da1:mpt1:0:0:0): SCSI status: Check Condition
(da1:mpt1:0:0:0): SCSI sense: ILLEGAL REQUEST asc:20,0 (Invalid command operation code)
(da1:mpt1:0:0:0): Error 22, Unretryable error

Moreover following line is displayed for each RDM disk on console during shutdown:
(da1:mpt1:0:0:0): Synchronize cache failed

The incompatibility effectively prevents using ZFS filesystem on RDM disks.

Fix: The DA_Q_NO_SYNC_CACHE quirk fixes the problem.

The RDM disks on ServeRAID adapter can be identified with these strings:
"ServeRA", "*", "*"
The second string reports the name of the array / logical disk. It is a user selectable information entered during ServeRAID setup, so that it can not be used for identification.

The fix has been sucessfully tested.

Now this identification is shown in dmesg:
da1 at mpt1 bus 0 scbus2 target 0 lun 0
da1: <ServeRA ZFSdisk0 V1.0> Fixed Direct Access SCSI-2 device 
da1: 320.000MB/s transfers (160.000MHz, offset 127, 16bit)
da1: Command Queueing enabled
da1: 204800MB (419430400 512 byte sectors: 255H 63S/T 26108C)
da1: quirks=0x1<NO_SYNC_CACHE>

No SCSI error messages are written in dmesg, ZFS is sucessfully mounted and used, the shutdown is errorfree as well.

It is not clean solution (it only suppress cache flushing), but it is the best effort. I believe the caches are flushed during VMware shutdown.


Patch attached with submission follows:
How-To-Repeat: It is probably a bug of ServeRAID firmware, which does not implement mandatory SCSI command SYNCHRONIZE CACHE(10).
The firmware version 5.2-0 (Build 17005) is latest, no corrected version has been released.
The virtualization layer should not affect it, because for RDM disks in Physical Compatibility Mode, SCSI commands are simply tunelled from VM to RAID controller.

May be the same error can be observed on FreeBSD installed directly on hardware, but such configuration has not been tested.
Comment 1 Petr Cibulka 2013-11-21 23:04:31 UTC
Thank you for the PR confirmation.

Please reassign it to "njl".
He is the author of the cam/scsi/scsi_da.c driver.

The severity is "Serious" from my perspective.

It would be nice to see the patch in upcoming FreeBSD 10.0 .

Thanks for your help,
Petr Cibulka

On 21/11/2013 23:20, FreeBSD-gnats-submit@FreeBSD.org wrote:
> Thank you very much for your problem report.
> It has the internal identification `kern/184154'.
> The individual assigned to look at your
> report is: freebsd-bugs.
>
> You can access the state of your problem report at any time
> via this link:
>
> http://www.freebsd.org/cgi/query-pr.cgi?pr=184154
>
>> Category:       kern
>> Responsible:    freebsd-bugs
>> Synopsis:       [cam] QUIRK: SYNC_CACHE not supported on IBM ServeRAID 8k
>> Arrival-Date:   Thu Nov 21 22:20:00 UTC 2013
Comment 2 Matt Jacob freebsd_committer freebsd_triage 2013-11-27 22:09:41 UTC
Why do you think this is a serious error? All that the lack of this 
command support does is cause the driver to be noisy. The device still 
functions correctly, doesn't it?

10.X is frozen and won't be changed until after release.
Comment 3 Petr Cibulka 2013-11-27 23:29:26 UTC
Just attempt to mount a ZFS pool on six RDM disks caused writing 13 
error messages for each disk (78 error messages together, five line in 
dmesg each).
The mount WAS NOT successful.

ZFS uses SYNC_CACHE quite often (because of its principle), it can lead 
into significant performance degradation (if mounted).

I can not tolerate such behaviour in production environment handling 
terabytes of archive data.

I need to upgrade to 10.0 as soon as the STABLE version will be released.
It seems I will have to patch and recompile kernel first. :-(

On 27/11/2013 23:09, Matthew Jacob wrote:
> Why do you think this is a serious error? All that the lack of this
> command support does is cause the driver to be noisy. The device still
> functions correctly, doesn't it?
>
> 10.X is frozen and won't be changed until after release.
Comment 4 Steven Hartland freebsd_committer freebsd_triage 2015-03-15 11:58:23 UTC
Could you provide the full output from camcontrol identify (if it works) and camcontrol inquiry.

The version string you've used is very wide and disabling cache sync is not ideal.

Also which firmware version are your using and is there an update, as this really should be fixed there.
Comment 5 Eitan Adler freebsd_committer freebsd_triage 2018-05-23 10:27:10 UTC
batch change of PRs untouched in 2018 marked "in progress" back to open.
Comment 6 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2019-01-18 05:16:20 UTC
No additional information provided, closing as "Feedback timeout"

Thanks for reporting and sorry that PR slipped through cracks.