2015-05-14 4:08 GMT+03:00 Qu Wenruo <quwenruo@xxxxxxxxxxxxxx>: > > > -------- Original Message -------- > Subject: Re: de-duplication algos > From: David Sterba <dsterba@xxxxxxx> > To: Hugo Mills <hugo@xxxxxxxxxxxxx>, Learner Study > <learner.study@xxxxxxxxx>, linux-btrfs <linux-btrfs@xxxxxxxxxxxxxxx>, Mark > Fasheh <mfasheh@xxxxxxx> > Date: 2015年05月14日 00:48 > >> On Wed, May 13, 2015 at 04:35:53PM +0000, Hugo Mills wrote: >>> >>> On Wed, May 13, 2015 at 09:24:17AM -0700, Learner Study wrote: >>>> >>>> Hello, >>>> >>>> I have been reading on de-duplication and how algorithms such as Bloom >>>> and Cuckoo filters are used for this purpose. >>>> >>>> Does BTRFS dedup use any of these, or are there plans to incorporate >>>> these in future? >>> >>> >>> There was a long discussion on IRC about different approaches that >>> could be taken. I think Mark Fasheh captured most of that somewhere -- >>> I thought he'd put it on the duperemove github site somewhere, but I >>> can't see it right now. >> >> >> The bloom filter for duperemove has been implemented (as of commit >> b7c03422ea9fd11f915804df2b6598a6ed10dfce) and works fine, the memory >> footprint is much lower than before. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > Zhao Lei and I are also trying to implement *in-band* de-duplication. > Our idea is to implement a memory pool to keep a csum<->extent map to do > *PARTIAL* dedup. > As we consider de-duplication doesn't need to de-dup 100% of duplications, > it's just a nice addition but not a fundamental function. > > The memory pool bahaviors as last-recent-use, and user can adjust how big > the memory pool is. (Yeah, put the dirty work to user) > > Bloom filter seems quite interesting, but it also seems hard to remove items > from them, so also hard to limit memory usage in kernel. > Since I'm not familiar with algorithms like Bloom filter, any advice on such > algorithms available is welcomed. cuckoo? https://github.com/efficient/cuckoofilter > Thanks, > Qu > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
