I was wondering when in-band deduplication was likely to make it in to BTRFS as a standard feature and was wondering if this could make network transfer more efficient (outside of the scope of deduplication in just the set of data that was being transferred)... For example: (In this example for simplicity I am just going to use the eg of whole file deduplication, not block level) I want to copy file A, B, C, and D from computer X to computer Y; at the beginning of the backup the hashes of each file are sent from X to Y, computer Y already has a hash that matches file B, in this case it generates a hash on said file using a different method (in case it was a false-positive) and sends it with the file name back to computer X. If the returned hash also matches file B it sends A, C, and D, computer Y realises that file B that came in the initial list is not in the new list and knows to reference the file content from its existing match it found prior. Of course, deduplication should also happen within the scope of the data that is being itself, for eg if file A, and C are the same, their content should only be sent 1x... but this would not necessarily need to be calculated in advance (prior to all of the files being sent), unless the user wants a more accurate calculation of time remaining. Thanks, Kris
