On Thu, 2008-08-14 at 11:49 -0700, Zach Brown wrote: > > File granularity is not well suited to dedup when files differ by only a > > few blocks, but I'd want to see some numbers on how often that happens > > before carrying around the disk format needed to do block level dedup. > > I was imagining that one could easily make a flag to debug-tree which > caused it to just dump the file block checksums from the extent items, > maybe restricted to a given subvol. Pipe that through sort and uniq -c > and you have a pretty easy path to a rough histogram of checksum values. > > But I sort of wonder if the point isn't to dedup systems that were > deployed on previous-generation file systems. If people knew that dedup > worked, they might be able to more easily deploy simpler systems that > didn't have to be so careful at, say, maintaining hard link farms. > > I dunno, just a thought. The backup and virtualization use cases are why I've still got it on the table for consideration at least. Especially virtualization because there you'll tend to have large disk image files that have tiny changes between each other. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
