Re: Data Deduplication with the help of an online filesystem check

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am Dienstag, 28. April 2009 22:16:19 schrieb Thomas Glanzmann:
> Hello Heinz,
>
> > It's not only cpu time, it's also memory. You need 32 byte for each 4k
> > block.  It needs to be in RAM for performance reason.
>
> exactly and that is not going to scale.
>
>         Thomas


Hi Thomas,

I wrote a backup tool which uses dedup, so I know a little bit about the 
problem and the performance impact if the checksums are not in memory 
(optionally in that tool).
http://savannah.gnu.org/projects/storebackup

Dedup really helps a lot - I think more than I could imagine before I was 
engaged in this kind of backup. You will not beleve how many identical files 
are in a filesystem to give a simple example.

EMC has very big boxes for this with lots of RAM in it.
I think the first problem which has to be solved is the memory problem. 
Perhaps something asynchronous to find identical blocks and storing the 
checksums on disk?

Heinz
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux