Re: [markfasheh/duperemove] Why blocksize is limit to 1MB?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/04/2017 12:12 AM, Peter Becker wrote:
> Good hint, this would be an option and i will try this.
> 
> Regardless of this the curiosity has packed me and I will try to
> figure out where the problem with the low transfer rate is.
> 
> 2017-01-04 0:07 GMT+01:00 Hans van Kranenburg <hans.van.kranenburg@xxxxxxxxxx>:
>> On 01/03/2017 08:24 PM, Peter Becker wrote:
>>> All invocations are justified, but not relevant in (offline) backup
>>> and archive scenarios.
>>>
>>> For example you have multiple version of append-only log-files or
>>> append-only db-files (each more then 100GB in size), like this:
>>>
>>>> Snapshot_01_01_2017
>>> -> file1.log .. 201 GB
>>>
>>>> Snapshot_02_01_2017
>>> -> file1.log .. 205 GB
>>>
>>>> Snapshot_03_01_2017
>>> -> file1.log .. 221 GB
>>>
>>> The first 201 GB would be every time the same.
>>> Files a copied at night from windows, linux or bsd systems and
>>> snapshoted after copy.
>>
>> XY problem?
>>
>> Why not use rsync --inplace in combination with btrfs snapshots? Even if
>> the remote does not support rsync and you need to pull the full file
>> first, you could again use rsync locally.

<annoyed>please don't toppost</annoyed>

Also, there is a rather huge difference in the two approaches, given the
way how btrfs works internally.

Say, I have a subvolume with thousands of directories and millions of
files with random data in it, and I want to have a second deduped copy
of it.

Approach 1:

Create a full copy of everything (compare: retrieving remote file again)
(now 200% of data storage is used), and after that do deduplication, so
that again only 100% of data storage is used.

Approach 2:

cp -av --reflink original/ copy/

By doing this, you end up with the same as doing approach 1 if your
deduper is the most ideal in the world (and the files are so random they
don't contain duplicate blocks inside them).

Approach 3:

btrfs sub snap original copy

W00t, that was fast, and the only thing that happened was writing a few
16kB metadata pages again. (1 for the toplevel tree page that got cloned
into a new filesystem tree, and a few for the blocks one level lower to
add backreferences to the new root).

So:

The big difference in the end result between approach 1,2 and otoh 3 is
that while deduplicating your data, you're actually duplicating all your
metadata at the same time.

In your situation, if possible doing an rsync --inplace from the remote,
so that only changed appended data gets stored, and then useing native
btrfs snapshotting it would seem the most effective.

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux