On Wed, May 13, 2015 at 09:24:17AM -0700, Learner Study wrote: > Hello, > > I have been reading on de-duplication and how algorithms such as Bloom > and Cuckoo filters are used for this purpose. > > Does BTRFS dedup use any of these, or are there plans to incorporate > these in future? There was a long discussion on IRC about different approaches that could be taken. I think Mark Fasheh captured most of that somewhere -- I thought he'd put it on the duperemove github site somewhere, but I can't see it right now. One outcome of the discussion is that a probabilistic set implementation is potentially useful, but there's still a lot of work that needs to be done around those core algorithms to make a useful deduplicator. A related outcome was that there's a lot of different approaches that are possible, which can optimise for RAM, storage space (both at runtime, and in the resulting deduplicated FS), or execution time in different ways. I think we ended up with about 7 or 8 different possible algorithms by the end of it, even before looking at the implementation details of which probabilistic set algorithm to pick. If Mark doesn't have those notes any more, I can try to dig out the original IRC discussion. Hugo. -- Hugo Mills | My code is never released, it escapes from the git hugo@... carfax.org.uk | repo and kills a few beta testers on the way out. http://carfax.org.uk/ | PGP: E2AB1DE4 | Diablo-D3
Attachment:
signature.asc
Description: Digital signature
