195479 – mpt(4) on VMware ESXi causes enormous number of SCSI busy errors

Bug 195479 - mpt(4) on VMware ESXi causes enormous number of SCSI busy errors

Summary: mpt(4) on VMware ESXi causes enormous number of SCSI busy errors

Status:	Open

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	10.1-RELEASE
Hardware:	amd64 Any

Importance:	--- Affects Some People
Assignee:	freebsd-virtualization (Nobody)

URL:
Keywords:

Depends on:
Blocks:

Reported:	2014-11-28 17:53 UTC by Nathan Whitehorn
Modified:	2021-05-14 12:02 UTC (History)
CC List:	5 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Nathan Whitehorn freebsd_committer

2014-11-28 17:53:26 UTC

I'm using VMWare ESXi with raw device mappings. During heavy disk I/O, my logs fill (*very* rapidly) with messages like the following:

(da14:mpt0:0:15:0): WRITE(10). CDB: 2a 00 19 6c b0 d8 00 00 28 00 
(da14:mpt0:0:15:0): CAM status: SCSI Status Error
(da14:mpt0:0:15:0): SCSI status: Busy
(da14:mpt0:0:15:0): Retrying command
(da12:mpt0:0:13:0): WRITE(10). CDB: 2a 00 19 85 46 e0 00 00 20 00 
(da12:mpt0:0:13:0): CAM status: SCSI Status Error
(da12:mpt0:0:13:0): SCSI status: Busy
(da12:mpt0:0:13:0): Retrying command
(da13:mpt0:0:14:0): WRITE(10). CDB: 2a 00 19 85 46 e0 00 00 20 00 
(da13:mpt0:0:14:0): CAM status: SCSI Status Error
(da13:mpt0:0:14:0): SCSI status: Busy
(da13:mpt0:0:14:0): Retrying command

I see these, simultaneously from every disk in the attached array, at ~ 1 minute intervals. I am assuming VMware is using this code as a signaling mechanism rather than an error, but FreeBSD treats it as an error. If the retry count is exceeded, IO will stop to the disks.

Comment 1 a 2014-12-15 11:24:24 UTC

Hi,

You're not alone with this problem:

Same here: VMWare 5.5 ("vshpere") - happens with FreeBSD 9.3-STABLE and 10.1-STABLE.

Comment 2 Alexander Motin freebsd_committer

2015-02-02 20:49:46 UTC

Commit r278111 may improve handling of those errors, making them non-fatal. But I believe that real issue is somewhere on VMware/storage side, not FreeBSD.

Comment 3 dburkland 2015-04-27 05:39:43 UTC

Hi All,

I ran into this same issue and after some troubleshooting I resolved it by specifying a lower value for my storage system's NFS read transfer size (which hosted the FreeBSD 10.1 VMware virtual machine over NFS)

Thanks to Alex for the tip to look back at the storage!

Dan

Comment 4 Alexander Motin freebsd_committer

2015-04-27 13:51:38 UTC

There is no bug on FreeBSD side in those errors, but storage delays.

Comment 5 Nathan Whitehorn freebsd_committer

2015-04-27 17:08:44 UTC

That's not true. These "errors" cause resets, which hose IO performance, as well as polluting system logs. Ignoring them significantly improves IO on affected systems (factors of two). There should be a sysctl or something to do this.

Comment 6 Alexander Motin freebsd_committer

2015-04-27 20:22:04 UTC

As you can see above, r278111 should fix major issues from those errors. Whether to hide those errors completely -- I am not sure: They are indicators of storage overload.

Comment 7 Mark Linimon freebsd_committer

2021-05-14 12:02:14 UTC

^Triage: correct assignment.  Discussed with: koobs@.