On Tue, Jul 21, 2020 at 2:33 PM Goffredo Baroncelli <kreijack@xxxxxxxxx> wrote: > > > Hi all, > > this is an RFC to discuss a my idea to allow a simple rollback of the > root filesystem at boot time. > > The problem that I want to solve is the following: DPKG is very slow on > a BTRFS filesystem. The reason is that DPKG massively uses > sync()/fsync() to guarantee that the filesystem is always coherent even > in case of sudden shutdown. > > The same can be useful even to the RPM Linux based distribution (which however > suffer less than DPKG). > > A way to avoid the sync()/fsync() calls without loosing the DPKG > guarantees, is: > 1) perform a snapshot of the root filesystem (the rollback one) > 2) upgrade the filesystem without using sync/fsync > 3) final (global) sync > 4) destroy the rollback snapshot > > If an unclean shutdown happens between 1) and 4), two subvolume exists: > the 'main' one and the 'rollback' one (which is the snapshot before the > update). In this case the system at boot time should mount the "rollback" > subvolume instead of the "main" one. Otherwise in case of a "clean" boot, the > "rollback" subvolume doesn't exist and only the "main" one can be > mounted. > > In [1] I discussed a way to implement the steps 1 to 4. (ok, I missed > the point 3) ). > > The part that was missed until now, is an automatic way to mount the rollback > subvolume at boot time when it is present. > > My idea is to allow more 'subvol=' option. In this case BTRFS tries all the > passed subvolumes until the first succeed. So invoking the kernel as: > > linux root=UUID=xxxx rootflags=subvol=rollback,subvol=main ro > > First, the kernel tries to mount the 'rollback' subvolume. If the rollback > subvolume doesn't exist then it mounts the 'main' subvolume. > > Of course after the mount, the system should perform a cleanup of the > subvolumes: i.e. if a rollback subvolume exists, the system should destroy > the "main" one (which contains garbage) and rename "rollback" to "main". > To be more precise: > > if test -d "rollback"; then > if test -d "old"; then > btrfs sub del "old" > fi > if test -d "main"; then > mv "main" "old" > fi > mv "rollback" "main" > btrfs sub del "old" > fi > > Comments are welcome > BR > G.Baroncelli > > [1] http://lore.kernel.org/linux-btrfs/69396573-b5b3-b349-06f5-f5b74eb9720d@xxxxxxxxx/ > > P.S. > I am guessing if an idea like this can be applied to a file. E.g. a sqlite > database that instead of reling to sync/fsync, creates a reflink file as > "rollback" if something goes wrong.... The ordering is preserved. Not the > duration. One way: btrfs sub snap main rollback change bootloader rootflags=subvol=rollback and /etc/fstab (or use btrfs sub set-default) do the update to main - if it blows up at anytime, rollback is what's used, delete main and rename rollback to main - if it succeeds, revert the bootloader changes so main boots, but keep rollback in case booting main fails Another way: btrfs sub snap main update lock the /var /etc /boot for main from changes: no configuration changes, no package changes, but user can keep working on user space things use bwrap/nspawn/podman to load up and assemble the update tree and perform the update out of band - if update blows up, just delete the update snapshot, and then unlock the system from disallowed changes - if update succeeds, main can be renamed mainold and update can be renamed main, update bootloader stuff; everything still stays locked and the user can keep working on user space things until they're ready to reboot; nice thing about containers is you can apply cgroupsv2 controls to make sure the update has no resource control impact on the user's current work Personally I prefer the latter, doing the update out of band rather than applying the update either on a running sysroot or having to do an offline (reboot to a minimal environment) update. I think locking the user out of system changes is acceptable for such an out of band update. The alternative is something like the merge of /etc /var things that have changed during the time the update was initiated - I think it's not worth that complexity but if someone wants to build that, OK. -- Chris Murphy
