Bug 193124 - GELI data integrity verification should consider sparse zero pass-through
Summary: GELI data integrity verification should consider sparse zero pass-through
Status: Closed Not Accepted
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.0-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-29 17:17 UTC by Mike
Modified: 2024-04-15 22:01 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mike 2014-08-29 17:17:55 UTC
GEOM geli data authentication (option -a) verifies data integrity of the blocks on a device.  The current implementation requires that the entire device be written before it can be read from.  There's a note on this in the geli(8) man page.

I'd like to consider the use case where the device is sparse - whether it's a sparse file backed block device, an SSD with TRIM, or even just a really big hard drive that hasn't been written to yet.  In this case, writing to the device in order to compute checksums would take quite a bit of time with a big device, and also would consume resources to store that data, especially if the underlying layer that detects a sparse sector doesn't align with the integrity information.

When the underlying data in a sector is all zeroes (or is presumed to be all zeroes if it's unwritten or sparse), perhaps the verification layer should let the checksum be artificially produced as all zeroes too.  In terms of hashing, there shouldn't be any risk in plucking out one output hash (consisting of all zeroes) from a pool of 2^256 or more, since it's impractical to find any true data source that hashes to that.  The GEOM integrity layer most likely is used only on FreeBSD so there may not be a need to have the checksum algorithm perfectly match.  Maybe this can be a sysctl parameter or an option to geli.  I'd propose that any time a particular sector has all zeroes, then the checksum written to (and verified against) should also be zeroes.  This would effectively give the intended benefits desired in integrity verification but would make a difference even in the case of a traditional hard drive.  Instead of having to take hours or even days to write out the checksums to new drives, it would be instantly available.  For a drive (or sparse image) that is not all zeroes, then there's really no way around needing to write those sectors.
Comment 1 Xin LI freebsd_committer freebsd_triage 2024-04-15 22:01:59 UTC
Although I think this is an useful feature request to some extent, it's not really trivial to implement because in order to support it the GELI model has to keep track of which block is presumed to be zero's, and that bookkeeping information has to be stored somewhere.  There are legitimate reasons to always perform full initialization of the provider, by the way, because the size of data stored in the encrypted storage is revealed if that's not done, for example.

A more generic solution would probably be making GELI init to perform the initialization in background and have it mark the provider as "need initialization", and upon attach the provider initializes itself in the background.  This, however, would complicate the provider quite a bit because it needs to be able to recover from power outage, etc. without damaging data.  (This may be a good candidate for a summer of project project).