Bug 234705 - zpool initialize does not BIO_DELETE
Summary: zpool initialize does not BIO_DELETE
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Steven Hartland
Depends on:
Reported: 2019-01-07 20:38 UTC by Poul-Henning Kamp
Modified: 2019-04-10 09:26 UTC (History)
2 users (show)

See Also:

bio_delete dtrace script (331 bytes, text/plain)
2019-01-08 10:39 UTC, Steven Hartland
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Poul-Henning Kamp freebsd_committer 2019-01-07 20:38:16 UTC
On devices which support BIO_DELETE, zpool initialize should use it.

Not obvious to me if the BIO_DELETE should replace the write or supplement it.
Comment 1 Conrad Meyer freebsd_committer 2019-01-07 21:22:30 UTC
This could probably be implemented in vdev_initialize_ranges or vdev_initialize_thread using something like zio_trim (or otherwise invoking ZIO_TYPE_FREE); the 0xdeadbeef fill is likely useful for debugging, but probably not useful for end users.  It could default differently based on INVARIANTS.
Comment 2 Warner Losh freebsd_committer 2019-01-08 03:36:42 UTC
These days, BIO_DELETE is almost always a win. Writing zeros is a big lose often (though if the lower-layers implement a special WRITE ZEROS command, there might be some cases where that's useful: we currently do not). I doubt BOTH are ever anything but extra work. I would argue, however, that we should think carefully here. The difference between DSM Deallocate and WRITE ZEROS + Deallocate flag (for those nvme drives that support it) is very small (trimmed blocks can return a variety of things depending: 0, ff or last written data)...
Comment 3 Warner Losh freebsd_committer 2019-01-08 03:48:44 UTC
(In reply to Warner Losh from comment #2)
Which is a long way of saying BIO_DELETE is our deallocation abstraction, and we should use it, but it is (in FreeBSD) non-deterministic. We don't have a way to 'widen' the abstraction to one of the three possible values (0, f's or last) today. If ZFS cares what the data is, it should write the data. If it doesn't care, it should just use TRIM on devices that support it. I'm not sure how useful widening the abstraction would be, though, given the hair involved to make it work and the current lack of a real use case.
Comment 4 Steven Hartland freebsd_committer 2019-01-08 07:43:54 UTC
This has been the case for a very long time check to see if you have trim on init sysctl set to zero.
Comment 5 Steven Hartland freebsd_committer 2019-01-08 08:19:20 UTC
Back at a PC now so just to confirm the specific sysctl is:
Comment 6 Steven Hartland freebsd_committer 2019-01-08 10:39:40 UTC
Created attachment 200907 [details]
bio_delete dtrace script

Just checked on my 12.0-RELEASE box by creating an md backed pool while running the attached dtrace script and checking the output of:
sysctl kstat.zfs.misc.zio_trim

These show both BIO_DELETE's being processed and no unsupported increment so as far as I can tell vdev's are still being erased on init.

So I guess the question is what lead to come to the conclusion they aren't?
Comment 7 Poul-Henning Kamp freebsd_committer 2019-04-09 19:23:17 UTC
Sorry for not noticing your reponse.

I was watching gstat() during zpool init, and it was doing bio_write rather than bio_delete.

Underlying media was SATA (or possibly SAS) disks.

I will try to reproduce when I have a chance.
Comment 8 Steven Hartland freebsd_committer 2019-04-10 09:26:20 UTC
For sata you can check the sysctl's to see what delete method is enabled for each disk:
`sysctl kern.cam.ada |grep delete`