Thomas Glanzmann wrote:
no, I just used the md5 checksum. And even if I have a hash escalation which is highly unlikely it still gives a good house number.
I'd start with a crc32 and/or MD5 to find candidate blocks, then do a bytewise comparison before actually merging them. Even the risk of an accidental collision is too high, and considering there are plenty of birthday-style MD5 attacks it would not be extraordinarily difficult to construct a block that collides with e.g. a system library.
Keep in mind that although digests do a fairly good job of making unique identifiers for larger chunks of data, they can only hold so many unique combinations. Considering you're comparing blocks of a few kibibytes in size it's best to just do a foolproof comparison. There's nothing wrong with using a checksum/digest as a screening mechanism though.
-- m. tharp -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
