On Mon, May 19, 2014 at 01:59:01PM -0400, Austin S Hemmelgarn wrote: > On 2014-05-19 13:12, Konstantinos Skarlatos wrote: > > I have been testing duperemove and it seems to work just fine, in > > contrast with bedup that i have been unable to install/compile/sort out > > the mess with python versions. I have 2 questions about duperemove: > > 1) can it use existing filesystem csums instead of calculating its own? > While this might seem like a great idea at first, it really isn't. > BTRFS uses CRC32c at the moment as it's checksum algorithm, and while > that is relatively good at detecting small differences (i.e. a single > bit flipped out of every 64 or so bytes), it is known to have issues > with hash collisions. Normally, the data on disk won't change enough > even from a media error to cause a hash collision, but when you start > using it to compare extents that aren't known to be the same to begin > with, and then try to merge those extents, you run the risk of serious > file corruption. Also, AFAIK, BTRFS doesn't expose the block checksum > to userspace directly (although I may be wrong about this, in which case > i retract the following statement) this would therefore require some > kernelspace support. I'm pretty sure you could get the checkums via ioctl. The thing about dedupe though is that kernel is always doing a byte-by-byte comparison of the file data before merging it so we should never corrupt just because userspace gave us a bad range to dedupe. That said I don't necessarily disagree that it might not be as good an idea as it sounds. --Mark -- Mark Fasheh -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
