On Thu, 2008-12-11 at 10:05 +0000, Oliver Mattos wrote: > Hi, > > I've noticed many files have blocks of plain nulls up to a few kb long, > even files you wouldn't normally expect to, like ELF executables. I > know that with compression enabled these will compress very small, but > that will have a reasonable hit on performance. How much of an overhead > would it be to check all checksummed file extents to see if they match > the checksum for a blank (null filled) extent, and if it does then don't > save that data? You may not even want to do it with checksums - just > by reading the first few bytes of data and checking for "nullness" would > let you know if the block is null or not. (if the first 4 bytes are > null, then the whole block is likely to be nulls, so it's worth the > overhead of checking the whole block) > > This would seem like a particularly low overhead space and performance > tweak. (performance since read/write speed will be increased for > "average" files that contain a few null blocks) > > Any thoughts? The first comment is that it won't be as fast as you expect ;) Most disks read 64k of data about as fast as they read 4k of data, and so if you have a file with zeros sprinkled around the disk will end up reading the zeros and just not sending them back to you. Jim is definitely right about the cost of metadata for smaller extents. Putting pointers to the zero extent into the file will greatly increase the number of extents needed to describe a single file. Traditional filesystems usually don't detect zeros and skip them because userland will often write zeros to preallocate the file. Unless btrfs is in nodatacow mode, that preallocation step doesn't really impact layout and we could map zeros to a virtual extent that was never written or read. But at the end of the day, the main place that zeros come from is benchmarking programs. I would prefer to use compression or dedup and get larger benefits than to optimize away 4k at a time here and there. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
