Re: Clone or Backup Filesystem with compression and dedup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



dave posted on Sun, 08 Mar 2015 11:43:49 -0600 as excerpted:

> Greetings,
> 
> I've searched the wiki and the web [...]

Thanks for letting us know you already know about the wiki. =:^)  Just in 
case it's a different wiki... https://btrfs.wiki.kernel.org

> Use case:
> 
> I have a 5TB single drive btrfs filesystem.  The filesystem is using
> compression and deduplication (used bedup with size threshold of ~200kB
> resulting in 10s of thousands of deduplicated files).  The space savings
> is fantastic!
> 
> I have 2 blank 5TB drives to be used as a rotating offline backup,
> connecting to the computer once every few weeks and syncing the
> new/changed data.  The data is mostly static, slowly growing with
> ocassional changes.
> 
> What is the best way to:
> 
> 1) make an initial clone of the filesystem, keeping the compression and
> deduplication of the source filesystem?
> 
> 2) make periodic syncronizations of the source filesystem to the backup
> filesystem, again maintaining the compression and deduplication of the
> filesystem?
> 
> I know that I can make a new empty btrfs filesystem then rsync and
> bedup, but the file-level copy and bedup processes are very resource
> heavy and time consuming.
> 
> I've also considered adding a blank drive to the source filesystem as a
> raid mirror then breaking the mirror, but this seems like a brutal,
> unintended use of btrfs raid/redundancy..

You didn't mention btrfs send/receive.  That's the btrfs specific answer 
to the general question, and it's covered to some extent in the wiki 
(which is why I wondered if you read a different one, since you didn't 
mention send/receive after saying you'd read it).

However, while after the reference send/receive, further send/receives 
are generally very resource efficient, I don't know the extent to which 
it'll maintain the deduplication.  I don't actually use either dedup or 
send/receive for my use-case and haven't seen that specific use-case 
discussed on list and thus can't simply repeat the answer, so you'll need 
to wait for a dev or someone with more direct knowledge to answer that 
specific angle... or simply test it and report back, now that you have 
the more general answer steering you in the right direction. =:^)

What send does is make a read-only btrfs snapshot of the sending-side, 
then send that to the receiving side to recreate on a different 
filesystem.  After the first one, you keep a matching reference snapshot 
on each side to refer to, so just the changes can be sent.  Actually, it 
gets a bit fancier than that as you can have send reference multiple 
partial-reference snapshots as long as copies exist on both sides, such 
that where it makes sense send can reference other than the direct parent.

Other than the above possibility that it doesn't handle dedups as 
efficiently as it could/should, where I simply don't know, there are two 
potential disadvantages.

1) There are still possible bugs and corner-cases where send/receive 
might get stuck and thus error out on one end or the other.  A simple 
example, altho this simple case has actually been working for awhile now, 
was what happens when nesting is reversed, subdir A containing subdir B 
on the original/reference, but then switching them, so B contains A on 
the new send -- that's the kind of corner-case, only more complex now, 
that can still occasionally tie send/receive in knots and trigger an 
error.

However, it does reliably error out when there is a problem, so if both 
sides complete without errors, the result should in fact be 100% 
reliable.  Just be sure to check for errors, just in case. =:^)

As a result of this, it's always wise to have rsync or the like as a 
fallback method.  In the normal case you won't need the fallback, but if 
you do, better to have it ready to go, than to miss a backup and have 
that be the time you **NEED** it.  Of course, if you do find a send/
receive bug and can't use it for a time, be sure and keep the reference 
snapshots available so when the bug is fixed you can go back to using 
send/receive. =:^)

2) Btrfs send/receive /does/ use the btrfs snapshotting mechanism.  As 
such, the contraindications for btrfs snapshotting apply to send/receive 
as well.  Specifically, there are snapshotting complications regarding 
files set nocow, the most common use-case of which is on large (> half a 
GiB) active database and VM-image files, since the internal-rewrite-
pattern typical of such files tends to heavily fragment them on COW-based 
filesystems including btrfs.  There are, however, workarounds available, 
tho they might play havoc with dedup.

Basically, don't worry about this one if you don't regularly run VM 
images or large databases on the filesystem/subvolume you're going to be 
send/receiving.  If you do have this sort of files, further research is 
necessary, but one possibility is to segregate these onto a dedicated 
subvolume and then use other backup methods for it, since snapshots stop 
at subvolume boundaries, thus letting you use subvolumes to wall off the 
areas you don't want snapshotted and sent/received.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux