Re: de-duplication algos

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 13, 2015 at 09:24:17AM -0700, Learner Study wrote:
> Hello,
> 
> I have been reading on de-duplication and how algorithms such as Bloom
> and Cuckoo filters are used for this purpose.
> 
> Does BTRFS dedup use any of these, or are there plans to incorporate
> these in future?

   There was a long discussion on IRC about different approaches that
could be taken. I think Mark Fasheh captured most of that somewhere --
I thought he'd put it on the duperemove github site somewhere, but I
can't see it right now.

   One outcome of the discussion is that a probabilistic set
implementation is potentially useful, but there's still a lot of work
that needs to be done around those core algorithms to make a useful
deduplicator. A related outcome was that there's a lot of different
approaches that are possible, which can optimise for RAM, storage
space (both at runtime, and in the resulting deduplicated FS), or
execution time in different ways. I think we ended up with about 7 or
8 different possible algorithms by the end of it, even before looking
at the implementation details of which probabilistic set algorithm to
pick.

   If Mark doesn't have those notes any more, I can try to dig out the
original IRC discussion.

   Hugo.

-- 
Hugo Mills             | My code is never released, it escapes from the git
hugo@... carfax.org.uk | repo and kills a few beta testers on the way out.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |                                             Diablo-D3

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux