Re: Does GRUB btrfs support log tree?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 4, 2019 at 7:34 PM David Sterba <dsterba@xxxxxxx> wrote:
>
> On Sun, Oct 27, 2019 at 09:05:54PM +0100, Chris Murphy wrote:
> > > > Since log tree writes means a
> > > > full file system update hasn't happened, the old file system state
> > > > hasn't been dereferenced, so even in an SSD + discard case, the system
> > > > should still be bootable. And at that point Btrfs kernel code does log
> > > > replay, and catches the system up, and the next update will boot the
> > > > new state.
> > > >
> > > > Correct?
> > > >
> > >
> > > Yes. If we speak about grub here, it actually tries very hard to ensure
> > > writes has hit disk (it fsyncs files as it writes them and it flushes
> > > raw devices). But I guess that fsync on btrfs just goes into log and
> > > does not force transaction. Is it possible to force transaction on btrfs
> > > from user space?
>
> * sync/syncfs
> * the ioctl BTRFS_IOC_SYNC (calls syncfs)
> * ioctls BTRFS_IOC_START_SYNC + BTRFS_IOC_WAIT_SYNC
>
> > The only fsync I ever see Fedora's grub2-mkconfig do is for grubenv.
> > The grub.cfg is not fsync'd. When I do a strace of grub2-mkconfig,
> > it's so incredibly complicated. Using -ff -o options, I get over 1800
> > separate PID files exported. From what I can tell, it creates a brand
> > new file "grub.cfg.new" and writes to that. Then does a cat from
> > "grub.cfg.new" into "grub.cfg" - maybe it's file system specific
> > behavior, I'm not sure.
> >
> > I'm pretty sure "sync" will do what you want, it calls syncfs() and
> > best as I can tell it does a full file system sync, doesn't use the
> > log tree. I'd argue grub-mkconfig should write all of its files, and
> > then sync that file system, rather than doing any fsync at all.
>
> This would work in most cases. I'm not sure, but the update does not
> seem to be atomic. Ie. all old kernels match the old grub.cfg, or there
> are new kernels that match the new cfg.
>
> Even if there's not fsyncs and just the final sync, some other activity
> in the filesystem can do the sync before between updates of kernels and
> grub.cfg. Like this
>
> start:
>
> - kernel1
> - grub.cfg (v1)
>
> update:
>
> - add kernel2
> - remove kernel1
> - <something calls sync>
> - update grub.cfg (v2)
> - grub calls sync
>
> If the crash happens after sync and before update, kernel1 won't be
> reachable and kernel2 won't be in the grub.cfg.

Right. It's probably a bad practice to remove the fallback kernel,
which would be variably defined depending on the distribution, unless
the method of updating the kernel is atomic by design, proven by
testing.

In the single kernel case it could be done atomically with generic
filenaming, i.e. vmlinuz and initramfs, no versioning in the filename,
and a static bootloader configuration that's never updated, only ever
looks for vmlinuz and initramfs. The update would write out
vmlinuz.new and initramfs.new, and then sync. And then rename()
vmlinuz.new vmlinuz, and initramfs.new initramfs. Since it's two
files, it's not strictly atomic, likely more than one sector changes.
But it might be good enough?

I'm not really sure what the best practice is though. I asked about
this in a UEFI, EFI System partitioning (and thus FAT) context and it
seems like there really aren't any atomicity guarantees possible at
all which is a bit troubling. About the only way to do it is like on
Android with an A and B partition for the kernel and initramfs as
blobs, rather than being stored on file systems, and then indicate A
vs B by setting a partition attribute to indicate to the bootloader A
vs B priority with the other being fallback.

Anyway, the lack of a generic (file system independent) way to handle
this use case is actually a bit concerning.

-- 
Chris Murphy



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux