Re: de-duplication algos

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2015-05-14 4:08 GMT+03:00 Qu Wenruo <quwenruo@xxxxxxxxxxxxxx>:
>
>
> -------- Original Message  --------
> Subject: Re: de-duplication algos
> From: David Sterba <dsterba@xxxxxxx>
> To: Hugo Mills <hugo@xxxxxxxxxxxxx>, Learner Study
> <learner.study@xxxxxxxxx>, linux-btrfs <linux-btrfs@xxxxxxxxxxxxxxx>, Mark
> Fasheh <mfasheh@xxxxxxx>
> Date: 2015年05月14日 00:48
>
>> On Wed, May 13, 2015 at 04:35:53PM +0000, Hugo Mills wrote:
>>>
>>> On Wed, May 13, 2015 at 09:24:17AM -0700, Learner Study wrote:
>>>>
>>>> Hello,
>>>>
>>>> I have been reading on de-duplication and how algorithms such as Bloom
>>>> and Cuckoo filters are used for this purpose.
>>>>
>>>> Does BTRFS dedup use any of these, or are there plans to incorporate
>>>> these in future?
>>>
>>>
>>>     There was a long discussion on IRC about different approaches that
>>> could be taken. I think Mark Fasheh captured most of that somewhere --
>>> I thought he'd put it on the duperemove github site somewhere, but I
>>> can't see it right now.
>>
>>
>> The bloom filter for duperemove has been implemented (as of commit
>> b7c03422ea9fd11f915804df2b6598a6ed10dfce) and works fine, the memory
>> footprint is much lower than before.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> Zhao Lei and I are also trying to implement *in-band* de-duplication.
> Our idea is to implement a memory pool to keep a csum<->extent map to do
> *PARTIAL* dedup.
> As we consider de-duplication doesn't need to de-dup 100% of duplications,
> it's just a nice addition but not a fundamental function.
>
> The memory pool bahaviors as last-recent-use, and user can adjust how big
> the memory pool is. (Yeah, put the dirty work to user)
>
> Bloom filter seems quite interesting, but it also seems hard to remove items
> from them, so also hard to limit memory usage in kernel.
> Since I'm not familiar with algorithms like Bloom filter, any advice on such
> algorithms available is welcomed.

cuckoo?
https://github.com/efficient/cuckoofilter

> Thanks,
> Qu
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux