Bug 195479 - mpt(4) on VMware ESXi causes enormous number of SCSI busy errors
Summary: mpt(4) on VMware ESXi causes enormous number of SCSI busy errors
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.1-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-virtualization (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-28 17:53 UTC by Nathan Whitehorn
Modified: 2021-05-14 12:02 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nathan Whitehorn freebsd_committer freebsd_triage 2014-11-28 17:53:26 UTC
I'm using VMWare ESXi with raw device mappings. During heavy disk I/O, my logs fill (*very* rapidly) with messages like the following:

(da14:mpt0:0:15:0): WRITE(10). CDB: 2a 00 19 6c b0 d8 00 00 28 00 
(da14:mpt0:0:15:0): CAM status: SCSI Status Error
(da14:mpt0:0:15:0): SCSI status: Busy
(da14:mpt0:0:15:0): Retrying command
(da12:mpt0:0:13:0): WRITE(10). CDB: 2a 00 19 85 46 e0 00 00 20 00 
(da12:mpt0:0:13:0): CAM status: SCSI Status Error
(da12:mpt0:0:13:0): SCSI status: Busy
(da12:mpt0:0:13:0): Retrying command
(da13:mpt0:0:14:0): WRITE(10). CDB: 2a 00 19 85 46 e0 00 00 20 00 
(da13:mpt0:0:14:0): CAM status: SCSI Status Error
(da13:mpt0:0:14:0): SCSI status: Busy
(da13:mpt0:0:14:0): Retrying command

I see these, simultaneously from every disk in the attached array, at ~ 1 minute intervals. I am assuming VMware is using this code as a signaling mechanism rather than an error, but FreeBSD treats it as an error. If the retry count is exceeded, IO will stop to the disks.
Comment 1 a 2014-12-15 11:24:24 UTC
Hi,

You're not alone with this problem:

Same here: VMWare 5.5 ("vshpere") - happens with FreeBSD 9.3-STABLE and 10.1-STABLE.
Comment 2 Alexander Motin freebsd_committer freebsd_triage 2015-02-02 20:49:46 UTC
Commit r278111 may improve handling of those errors, making them non-fatal. But I believe that real issue is somewhere on VMware/storage side, not FreeBSD.
Comment 3 dburkland 2015-04-27 05:39:43 UTC
Hi All,

I ran into this same issue and after some troubleshooting I resolved it by specifying a lower value for my storage system's NFS read transfer size (which hosted the FreeBSD 10.1 VMware virtual machine over NFS)

Thanks to Alex for the tip to look back at the storage!

Dan
Comment 4 Alexander Motin freebsd_committer freebsd_triage 2015-04-27 13:51:38 UTC
There is no bug on FreeBSD side in those errors, but storage delays.
Comment 5 Nathan Whitehorn freebsd_committer freebsd_triage 2015-04-27 17:08:44 UTC
That's not true. These "errors" cause resets, which hose IO performance, as well as polluting system logs. Ignoring them significantly improves IO on affected systems (factors of two). There should be a sysctl or something to do this.
Comment 6 Alexander Motin freebsd_committer freebsd_triage 2015-04-27 20:22:04 UTC
As you can see above, r278111 should fix major issues from those errors. Whether to hide those errors completely -- I am not sure: They are indicators of storage overload.
Comment 7 Mark Linimon freebsd_committer freebsd_triage 2021-05-14 12:02:14 UTC
^Triage: correct assignment.  Discussed with: koobs@.