Deduplication Idea

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I was wondering when in-band deduplication was likely to make it in to
BTRFS as a standard feature and was wondering if this could make
network transfer more efficient (outside of the scope of deduplication
in just the set of data that was being transferred)...

For example:
(In this example for simplicity I am just going to use the eg of whole
file deduplication, not block level)
I want to copy file A, B, C, and D from computer X to computer Y; at
the beginning of the backup the hashes of each file are sent from X to
Y, computer Y already has a hash that matches file B, in this case it
generates a hash on said file using a different method (in case it was
a false-positive) and sends it with the file name back to computer X.
If the returned hash also matches file B it sends A, C, and D,
computer Y realises that file B that came in the initial list is not
in the new list and knows to reference the file content from its
existing match it found prior.

Of course, deduplication should also happen within the scope of the
data that is being itself, for eg if file A, and C are the same, their
content should only be sent 1x... but this would not necessarily need
to be calculated in advance (prior to all of the files being sent),
unless the user wants a more accurate calculation of time remaining.

Thanks,
Kris



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux