Re: BackupPC, per-dir hard link limit, Debian packaging

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wednesday 03 March 2010 00:22:31 jim owens wrote:
> Hubert Kario wrote:
> > On Tuesday 02 March 2010 03:29:05 Robert Collins wrote:
> >> As I say, I realise this is queued to get addressed anyway, but it seems
> >> like a realistic thing for people to do (use BackupPC on btrfs) - even
> >> if something better still can be written to replace the BackupPC store
> >> in the future. I will note though, that simple snapshots won't achieve
> >> the deduplication level that BackupPC does, because the fils don't start
> >> out as the same: they are identified as being identical post-backup.
> > 
> > Isn't the main idea behind deduplication to merge identical parts of
> > files together using cow? This way you could have many very similar
> > images of virtual machines, run the deduplication process and reduce
> > massively the space used while maintaining the differences between
> > images.
> > 
> > If memory serves me right, the plan is to do it in userland on a
> > post-fact filesystem, not when the data is being saved. If such a daemon
> > or program was available you would run it on the system after rsyncing
> > the workstations.
> > 
> > Though the question remains which system would reduce space usage more in
> > your use case. From my experience, hardlinks take less space on disk, I
> > don't know whatever it could be possible to optimise btrfs cow system
> > for files that are exactly the same.
> 
> Space use is not the key difference between these methods.
> The btrfs COW makes data sharing safe.  The hard link method
> means changing a file invalidates the content of all linked files.
> 
> So a BackupPC output should be read-only.

I know that, but if you're using "dumb" tools to replicate systems (say 
rsync), you don't want them to overwrite different versions of files and you 
still want to reclaim disk space used by essentially the same data.

My idea behind btrfs as backup storage and using cow not hardlinks for 
duplicated files comes from need to keep archival copies (something not really 
possible with hardlinks) in a way similar to rdiff-backup.

As first backup I just rsync to backup server from all workstations.
But on subsequent backups I copy the last version to a .snapshot/todays-date  
directory using cow, rsync from workstations and then run deduplication 
daemon.

This way I get both reduced storage and old copies (handy for user home 
directories...).

With such use-case, the ability to use cow while needing similar amounts of 
space as hardlinks would be at least useful if not very desired.

That's why I asked if it's possible to optimise btrfs cow mechanism for 
identical files.

>From my testing (directory 584MiB in size, 17395 files, Arch kernel 2.6.32.9, 
coreutils 8.4, btrfs-progs 0.19, 10GiB partition, default mkfs and mount 
options):
cp -al
free space decrease: 6176KiB

cp -a --reflink=always
free space decrease: 23296KiB

and in the second run:
cp -al
free space decrease: 6064KiB

cp -a --reflink=always
free space decrease: 23324KiB

that's nearly 4 times more!
-- 
Hubert Kario
QBS - Quality Business Software
ul. Ksawerów 30/85
02-656 Warszawa
POLAND
tel. +48 (22) 646-61-51, 646-74-24
fax +48 (22) 646-61-50
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux