On Mon, Feb 18, 2019 at 5:28 PM André Malm <admin@xxxxxxxxxx> wrote: > > Rsync is probably i bad idea yes. I could btrfs send -p the changed > "new" master subvolume and then delete the old master subvolume and then > reference the new master subvolume when transferring it later on i guess? I'm not sure how your application reacts to snapshots or reflinks, or how it updates its files. All of that needs to be tested to see what the incremental send size is, and if the resulting received snapshot contains files with the integrity your application expects, and so on. > > I'll explain the problem I'm trying to solve abit better; > > Say i have a program that will run in multiple instances. The program > requires a dataset of large files to run (say 20GB). The dataset will be > updated over time, i.e parts of them changes. These changes should only > apply to new instances for the program. The program will also generate > new data (both new files and also changing data in the the shared > dataset) that is unique to the instance of the child subvolume. Finally > I need to transfer the program together with its generated data to > another remote machine to continue it's processing there. What i want to > achieve is avoid having to transfer the entire dataset when only small > parts of it is changed by the program. I also want to avoid having to > duplicate copies of the data on the remote machine. Yep. Based on this description though, the only time I grok using 'btrfs send -p master.snap child.snap | btrfs receive /destination/' is for the initial transfer of child. Master must be already fully replicated. Now you can snapshot master and child on separate schedules to account for their different use case, and send their increments independent of each other. Or in fact maybe you'll realize you do have a use case for clone. Have you looked at GlusterFS or Ceph for this use case? I kinda wonder if there's any simplification to just having a clustered file system make all of the send/receive stuff go away, and you can ensure your data is replicated pretty much immediately, and is always available for all computers. *shrug* That's off topic but I'm curious if there are ways to simplify this for your use case. -- Chris Murphy
