Bug 193124

Summary: GELI data integrity verification should consider sparse zero pass-through
Product: Base System Reporter: Mike <5ukk2nmn43>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: New ---    
Severity: Affects Some People    
Priority: ---    
Version: 10.0-RELEASE   
Hardware: Any   
OS: Any   

Description Mike 2014-08-29 17:17:55 UTC
GEOM geli data authentication (option -a) verifies data integrity of the blocks on a device.  The current implementation requires that the entire device be written before it can be read from.  There's a note on this in the geli(8) man page.

I'd like to consider the use case where the device is sparse - whether it's a sparse file backed block device, an SSD with TRIM, or even just a really big hard drive that hasn't been written to yet.  In this case, writing to the device in order to compute checksums would take quite a bit of time with a big device, and also would consume resources to store that data, especially if the underlying layer that detects a sparse sector doesn't align with the integrity information.

When the underlying data in a sector is all zeroes (or is presumed to be all zeroes if it's unwritten or sparse), perhaps the verification layer should let the checksum be artificially produced as all zeroes too.  In terms of hashing, there shouldn't be any risk in plucking out one output hash (consisting of all zeroes) from a pool of 2^256 or more, since it's impractical to find any true data source that hashes to that.  The GEOM integrity layer most likely is used only on FreeBSD so there may not be a need to have the checksum algorithm perfectly match.  Maybe this can be a sysctl parameter or an option to geli.  I'd propose that any time a particular sector has all zeroes, then the checksum written to (and verified against) should also be zeroes.  This would effectively give the intended benefits desired in integrity verification but would make a difference even in the case of a traditional hard drive.  Instead of having to take hours or even days to write out the checksums to new drives, it would be instantly available.  For a drive (or sparse image) that is not all zeroes, then there's really no way around needing to write those sectors.