Okay, I will indeed test the setup thoroughly. I have considered running
a distributed filesystem such as Ceph but my concern is that it will be
way to slow as disk IO speed is important. Anyways, thank you for your help!
On 2019-02-19 04:54, Chris Murphy wrote:
On Mon, Feb 18, 2019 at 5:28 PM André Malm <admin@xxxxxxxxxx> wrote:
Rsync is probably i bad idea yes. I could btrfs send -p the changed
"new" master subvolume and then delete the old master subvolume and then
reference the new master subvolume when transferring it later on i guess?
I'm not sure how your application reacts to snapshots or reflinks, or
how it updates its files. All of that needs to be tested to see what
the incremental send size is, and if the resulting received snapshot
contains files with the integrity your application expects, and so on.
I'll explain the problem I'm trying to solve abit better;
Say i have a program that will run in multiple instances. The program
requires a dataset of large files to run (say 20GB). The dataset will be
updated over time, i.e parts of them changes. These changes should only
apply to new instances for the program. The program will also generate
new data (both new files and also changing data in the the shared
dataset) that is unique to the instance of the child subvolume. Finally
I need to transfer the program together with its generated data to
another remote machine to continue it's processing there. What i want to
achieve is avoid having to transfer the entire dataset when only small
parts of it is changed by the program. I also want to avoid having to
duplicate copies of the data on the remote machine.
Yep. Based on this description though, the only time I grok using
'btrfs send -p master.snap child.snap | btrfs receive /destination/'
is for the initial transfer of child. Master must be already fully
replicated. Now you can snapshot master and child on separate
schedules to account for their different use case, and send their
increments independent of each other. Or in fact maybe you'll realize
you do have a use case for clone.
Have you looked at GlusterFS or Ceph for this use case? I kinda wonder
if there's any simplification to just having a clustered file system
make all of the send/receive stuff go away, and you can ensure your
data is replicated pretty much immediately, and is always available
for all computers. *shrug* That's off topic but I'm curious if there
are ways to simplify this for your use case.