Bug 253373 - mps(4) drive status request during spin-up timeout results in zpool degradation
Summary: mps(4) drive status request during spin-up timeout results in zpool degradation
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.2-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-02-09 14:14 UTC by Michael Gmelin
Modified: 2021-02-09 14:14 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Gmelin freebsd_committer freebsd_triage 2021-02-09 14:14:48 UTC
Short description:
------------------

When a drive spins up while `camcontrol epc -c status' is run on the drive, a command timeout happens that results in the drive being unavailable temporarily and marked as FAULTED in the zpool.

Increasing MPS_ATA_ID_TIMEOUT (in code, not tunable) might be a workaround.

Long version:
-------------

I've started seeing this after updating from 11.3 to 12.1 and, subsequently, 12.2. It's not necessarily caused to the update itself, but it might be related to switching to `camcontrol epc' after the update - I used `camcontrol cmd -a "E5 00..."' beforehand.

I have an LSI (Avago/Broadcom) SAS2008 based controller with two spindles connected to it (Seagate IronWolf) storing data only (system drives are separate).

As this is a home NAS, drives are configured to spin-down when idle for more than 30 minutes using camcontrol:

  /sbin/camcontrol standby /dev/da2 -v -t 1800
  /sbin/camcontrol standby /dev/da3 -v -t 1800

There's a cron job running every ten minutes logging the spin state of the drives using camcontrol:

  camcontrol epc da2 -c status -P
  camcontrol epc da2 -c status -P

All of this is working just fine, but once in a while, the zpool goes to a DEGRADED state, because one of the drives is in a FAULTED state (too many errors).

Based on the logs and the timestamps therein, this seems to be caused when a drive spins up exactly at the same time when an EPC status check is done.

So the order of events is like this:

  1. System is idle, drives are on standby
  2. Cronjob checks drive spin state using camcontrol epc status
  3. At the same time, a disk access happens, causing the drives in the zpool to spin up
  4. Spinning up of the drives delays reporting the spin state
  5. The timeout causes the mps driver to abort the command, subsequent requests to the drive fail
  6. Failed requests to the drive cause it to go to a FAILED state and degrade the zpool

This happens every few weeks by chance, e.g., today I accessed a file exactly in the right moment, which was recorded in the local webserver's error log, so everything matched up.

This is the log output, first, the drive status inquiry (which was started at 10:40:00) fails

  Feb  9 10:40:05 nas kernel:       (pass2:mps0:0:2:0): INQUIRY. CDB: 12 01 00 00 ff 00 length 255 SMID 803 Command timeout on target 2(0x0009) 5000 set, 5.388682389 elapsed

Then, the command is aborted:

  Feb  9 10:40:05 nas kernel: mps0: Sending abort to target 2 for SMID 803
  Feb  9 10:40:05 nas kernel:       (pass2:mps0:0:2:0): INQUIRY. CDB: 12 01 00 00 ff 00 length 255 SMID 803 Aborting command 0xfffffe00026c5708
  Feb  9 10:40:08 nas kernel: mps0: Controller reported scsi ioc terminated tgt 2 SMID 369 loginfo 31140000
  Feb  9 10:40:08 nas kernel: mps0: Finished abort recovery for target 2
  Feb  9 10:40:08 nas kernel: mps0: Unfreezing devq for target ID 2

At 10:40:09 the webservers log shows a 502 error (request for the file hung for a couple of seconds beforehand, so the two events clearly overlapped - I can't say if the spin-up was already initiated/the webserver request started shortly before the status request or if it was the other way round).

Following this, syslog shows a ton of messages about the drive not being ready:

  Feb  9 10:40:08 nas kernel: (da2:mps0:0:2:0): READ(10). CDB: 28 00 e4 00 4a 40 00 00 20 00 
  Feb  9 10:40:08 nas kernel: (da2:mps0:0:2:0): CAM status: CCB request completed with an error
  Feb  9 10:40:08 nas kernel: (da2:mps0:0:2:0): Retrying command, 9 more tries remain
  Feb  9 10:40:08 nas kernel: (da2:mps0:0:2:0): READ(10). CDB: 28 00 e4 00 4a 40 00 00 20 00 
  Feb  9 10:40:08 nas kernel: (da2:mps0:0:2:0): CAM status: SCSI Status Error
  Feb  9 10:40:08 nas kernel: (da2:mps0:0:2:0): SCSI status: Check Condition
  Feb  9 10:40:08 nas kernel: (da2:mps0:0:2:0): SCSI sense: NOT READY asc:4,0 (Logical unit not 
ready, cause not reportable)
  ...
  Feb  9 10:40:09 nas kernel: (da2:mps0:0:2:0): Error 5, Retries exhausted
  ...

Until ZFS realizes the vdev has issues:

  Feb  9 10:40:09 nas ZFS[38943]: vdev state changed, pool_guid=$xxxx vdev_guid=$xxxx

After that there are more NOT READY errors, and another timeout/abort:

  Feb  9 10:40:14 nas kernel:       (pass2:mps0:0:2:0): INQUIRY. CDB: 12 01 00 00 ff 00 length 255 SMID 724 Command timeout on target 2(0x0009) 5000 set, 5.353551387 elapsed
  Feb  9 10:40:14 nas kernel: mps0: Sending abort to target 2 for SMID 724
  Feb  9 10:40:14 nas kernel:       (pass2:mps0:0:2:0): INQUIRY. CDB: 12 01 00 00 ff 00 length 255 SMID 724 Aborting command 0xfffffe00026bece0
  Feb  9 10:40:14 nas kernel: mps0: Finished abort recovery for target 2
  Feb  9 10:40:14 nas kernel: mps0: Unfreezing devq for target ID 2

Afterwards, the drive is just fine and running "zpool clear" + scrubbing shows that all is good again - it's also not always the same drive, it just depends on timing).

Now, the obvious workaround is to remove logging of the drive spin state for the time being, which should fix the that specific issue, but I wonder if there is a solution for this (either, mps should be able to handle the timeout scenario gracefully, or it should be possible to avoid it increasing a timeout).

I haven't increased kern.cam.scsi_delay yet (it's also 5 seconds, but seems unrelated otherwise), but looking at the mps code, it seems like that increasing MPS_ATA_ID_TIMEOUT in /usr/src/sys/dev/mps/mpsvar.h might help to avoid this specific timeout:

  #define MPS_ATA_ID_TIMEOUT     5/* 5 second timeout for SATA ID cmd */

I would prefer to hear from someone familiar with the driver before starting to poke around in there though.