Re: Data Deduplication with the help of an online filesystem check

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thomas Glanzmann wrote:
no, I just used the md5 checksum. And even if I have a hash escalation
which is highly unlikely it still gives a good house number.

I'd start with a crc32 and/or MD5 to find candidate blocks, then do a bytewise comparison before actually merging them. Even the risk of an accidental collision is too high, and considering there are plenty of birthday-style MD5 attacks it would not be extraordinarily difficult to construct a block that collides with e.g. a system library.

Keep in mind that although digests do a fairly good job of making unique identifiers for larger chunks of data, they can only hold so many unique combinations. Considering you're comparing blocks of a few kibibytes in size it's best to just do a foolproof comparison. There's nothing wrong with using a checksum/digest as a screening mechanism though.

-- m. tharp
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux