I'm using VMWare ESXi with raw device mappings. During heavy disk I/O, my logs fill (*very* rapidly) with messages like the following:
(da14:mpt0:0:15:0): WRITE(10). CDB: 2a 00 19 6c b0 d8 00 00 28 00
(da14:mpt0:0:15:0): CAM status: SCSI Status Error
(da14:mpt0:0:15:0): SCSI status: Busy
(da14:mpt0:0:15:0): Retrying command
(da12:mpt0:0:13:0): WRITE(10). CDB: 2a 00 19 85 46 e0 00 00 20 00
(da12:mpt0:0:13:0): CAM status: SCSI Status Error
(da12:mpt0:0:13:0): SCSI status: Busy
(da12:mpt0:0:13:0): Retrying command
(da13:mpt0:0:14:0): WRITE(10). CDB: 2a 00 19 85 46 e0 00 00 20 00
(da13:mpt0:0:14:0): CAM status: SCSI Status Error
(da13:mpt0:0:14:0): SCSI status: Busy
(da13:mpt0:0:14:0): Retrying command
I see these, simultaneously from every disk in the attached array, at ~ 1 minute intervals. I am assuming VMware is using this code as a signaling mechanism rather than an error, but FreeBSD treats it as an error. If the retry count is exceeded, IO will stop to the disks.
You're not alone with this problem:
Same here: VMWare 5.5 ("vshpere") - happens with FreeBSD 9.3-STABLE and 10.1-STABLE.
Commit r278111 may improve handling of those errors, making them non-fatal. But I believe that real issue is somewhere on VMware/storage side, not FreeBSD.
I ran into this same issue and after some troubleshooting I resolved it by specifying a lower value for my storage system's NFS read transfer size (which hosted the FreeBSD 10.1 VMware virtual machine over NFS)
Thanks to Alex for the tip to look back at the storage!
There is no bug on FreeBSD side in those errors, but storage delays.
That's not true. These "errors" cause resets, which hose IO performance, as well as polluting system logs. Ignoring them significantly improves IO on affected systems (factors of two). There should be a sysctl or something to do this.
As you can see above, r278111 should fix major issues from those errors. Whether to hide those errors completely -- I am not sure: They are indicators of storage overload.