Bug 223950

Summary: lower default kern.cam.da.X.delete_max to avoid ZFS TRIM timeouts
Product: Base System Reporter: Jim Phillips <jim>
Component: kernAssignee: Warner Losh <imp>
Status: New ---    
Severity: Affects Some People CC: jim, ncrogers, phk
Priority: ---    
Version: 11.1-RELEASE   
Hardware: amd64   
OS: Any   

Description Jim Phillips 2017-11-28 22:38:07 UTC
We are seeing the issue with ZFS described at
https://lists.freebsd.org/pipermail/freebsd-scsi/2015-July/006777.html
where SATA SSDs on a SAS controller will time out on TRIM commands.

This was the original state, with TRIM failures recorded in kstat:

kern.geom.dev.delete_max_sectors: 262144
kern.cam.da.4.delete_max: 17179607040
kern.cam.da.4.delete_method: ATA_TRIM
kern.cam.da.3.delete_max: 17179607040
kern.cam.da.3.delete_method: ATA_TRIM
kern.cam.da.2.delete_max: 17179607040
kern.cam.da.2.delete_method: ATA_TRIM
kern.cam.da.5.delete_max: 17179607040
kern.cam.da.5.delete_method: ATA_TRIM
kern.cam.da.1.delete_max: 17179607040
kern.cam.da.1.delete_method: ATA_TRIM
kern.cam.da.0.delete_max: 17179607040
kern.cam.da.0.delete_method: ATA_TRIM
kern.cam.ada.0.delete_method: DSM_TRIM
vfs.zfs.trim.max_interval: 1
vfs.zfs.trim.timeout: 30
vfs.zfs.trim.txg_delay: 32
vfs.zfs.trim.enabled: 1
vfs.zfs.vdev.trim_max_pending: 10000
vfs.zfs.vdev.bio_delete_disable: 0
vfs.zfs.vdev.trim_max_active: 64
vfs.zfs.vdev.trim_min_active: 1
vfs.zfs.vdev.trim_on_init: 1
hw.nvd.delete_max: 1073741824
kstat.zfs.misc.arcstats.deleted: 138850
kstat.zfs.misc.zio_trim.failed: 9399097
kstat.zfs.misc.zio_trim.unsupported: 0
kstat.zfs.misc.zio_trim.success: 792148930
kstat.zfs.misc.zio_trim.bytes: 31513520795648

Our current workaround is to lower delete_max (for all drives):
sysctl kern.cam.da.0.delete_max=536870912

I suggest a lower default value than the current 17179607040.
Comment 1 Poul-Henning Kamp freebsd_committer 2019-02-07 07:51:23 UTC
Seconded.

I just triggered a driver panic because zpool create BIO_DELETED entire 2TB SSD drives.

There is no relevant performance loss in clamping delete_max to something in the MB-GB range.