Re: How do damaged root trees happen and how to protect against power cut?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 24, 2020 at 1:54 AM Carsten Behling
<carsten.behling@xxxxxxxxxxxxxx> wrote:

> Mo., 23. März, 16:58 (vor 15 Stunden)
> an Chris
> > Seed device?
> >
> > Create a Btrfs file system, use space_cache v2,
> > compress-force=zstd:16, and write the root image. Resize the file
> > system to minimum. Set the seed flag. That's the base image. Part of
> > the provisioning will be to 'btrfs device add' a 2nd partition, and
> > remount read-write. This means two Btrfs file systems exist, each with
> > their own UUID. You can reference the read-only seed by its UUID; and
> > you can reference the read-write volume by its own UUID. On-disk
> > metadata for this read-write volume points to both the read-only seed
> > devid1, and the writable 2nd device devid2.
> >
> > Make sure write cache on the physical media is disabled.
>
> Are this the correct steps in detail:

I can't sanity check every single step. But I'll comment on what I can.

>
> 1. Partition SD card with:
> - (write Bootloader ...)
> - first partition boot (FAT32 (0x0b), 50MB)
> - second partition (Linux Native (0x83), minimum possible size to fit rootfs)
> - third partition (Linux Native (0x83), rest
> - (write boot files (kernel ...))

Seems like bootloader happens later, whether BIOS or UEFI.


>
> 2. Create seed device on development host:
>
> # mkfs.btrfs --rootdir ~/rootfs --shrink /dev/sda2 # sda is my SD card device
> # btrfstune -S 1 /dev/sda2
> # dd if=/dev/zero of=/dev/sda3 bs=1024
> # mount /dev/sda2 /mnt
> # btrfs device add /dev/sda3 /mnt
> # hdparm -W 0 /dev/sda3 # disable write cache

I haven't populated a btrfs file system using --rootdir option of
mkfs. I've only ever done it by using kernel code (mounted file
system) and then just shrink the resulting file system to minimum size
and/or fstrim so that it's a sparse file. That way I can also take
advantage of fs compression for the seed.

I'd substitute the dd command above with 'blkdiscard' and relocate it
to step 1 as a preparation step.

Pretty sure you need 'mount -o remount,rw' before it's possible to add
a 2nd device.

The hdparm step is probably only important for production use.


>
> 3. Mount on embedded device
>
> - Kernel command line option: "root=/dev/mmcblk0p2 ro rootwait"
> - Later, 'systemd-remount-fs.service' remounts seed device 'rw' by
> appliying mmount options from fstab:
> ...
> # 'defaults' includes 'rw', 'ROOT' is /dev/mmcblk0p2 (seed device)
> LABEL=ROOT       /                    btrfs
> defaults,noatime,nodiratime,space_cache=v2,compress-force=zstd:16
>  1  1
> ...


The read-only seed device itself can't be mounted read-write. That's
the point of a seed-device. All changes go to the 2nd device. What you
really want to do during production is mount by the fs UUID of the
"sprout".

At mkfs time, devid 1 (first device, which becomes the read-only seed)
has an fs UUID.

When you 'btrfs dev add' a 2nd device to a seed, that 2nd device is
sometimes called a "sprout" device, let's call it devid 2. A new fs
UUID is generated, which is a Btrfs volume made of two devices, devid1
and devid2.

Therefore, if you use root=UUID=fsUUID"seed" this would mount the
read-only seed, and could be used as a way to "reset" the system. If
you use root=UUID=fsUUID"sprout" then this references both devid1 and
devid2, and will mount read-write by default.

It's superfluous detail for your use case, but for the sake of a
complete answer, a "sprout" isn't always 2 devices, even though it
starts that way. It is possible to delete devid1, which then causes
replication of the seed to the sprout. Once finished, devid1, the
seed, is removed. And now the seed and sprout are each single device
Btrfs volumes and totally independent.

Anyway, for your "reset" option, you probably need one of two things.
a) read-only rootfs support in the initramfs so that you can boot the
read-only seed or b) setup a ramdisk, such a zram device, and use it
as a volatile "sprout", now you can remount sysroot read-write, and
perform the reset which would be something like doing a blkdiscard on
the "sprout" device you want to get rid of, and then create a new
persistent sprout. This double use of the seed is completely valid.

There is one possible gotcha you can run into, but again don't think
it applies to your use case:

btrfs multiple devices confusion: automatically unmounted /home,
clobbered ssh session
https://github.com/systemd/systemd/issues/14674


-- 
Chris Murphy




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux