On Thu, Mar 19, 2020 at 9:14 AM Carsten Behling <carsten.behling@xxxxxxxxxxxxxx> wrote: > > Hi, > > the investigation of damaged root trees are already discussed in the > thread starting with > > https://www.spinics.net/lists/linux-btrfs/msg74019.html > > However, one point wasn't discussed at the end: > > > I thought so too. Is there a reason why they ended up being colocated? > > I'm surprised with all the redundancies btrfs is capable of, this can > > happen. Was it because the volume was starting to become full? (This > > whole exercise of turning on mirroring was because we're migrating to > > bigger disks) > > Because I have the same issue on an embedded system, after a power > cut, where none of the root tree copies are usable anymore, I'd also > like to know : > > - How can we end up in that recoverable state? > - Why can't we protect the fs against the unrecoverable state? > - Why is that error is so hard to recover? I'm interested in this too. Also I want to know whether and what Btrfs debug or consistency check flags are applicable in discovering these problems as near to the time as they occur; whether they're Btrfs, block layer, or device problems. > Furthermore, I'd like to know what would be the best solution for an > embedded system where power cuts are unavoidable (because of a missing > circuit). I'm thinking of using a read-only rootfs with a separate > data partition to ensure at least a booting system. But anyway, the > data partition could end up in the same state. > > I'm not sure if it would be also a good option working with snapshots. > My space on the embedded device is limited to 8GB. The OS already > takes about 4GB. Seed device? Create a Btrfs file system, use space_cache v2, compress-force=zstd:16, and write the root image. Resize the file system to minimum. Set the seed flag. That's the base image. Part of the provisioning will be to 'btrfs device add' a 2nd partition, and remount read-write. This means two Btrfs file systems exist, each with their own UUID. You can reference the read-only seed by its UUID; and you can reference the read-write volume by its own UUID. On-disk metadata for this read-write volume points to both the read-only seed devid1, and the writable 2nd device devid2. Make sure write cache on the physical media is disabled. It might be true that 'flushoncommit' and 'notreelog' reduce complexity for recovery following a crash; at the expense of losing some data in the latter case. (It's been suggested before in the archives, but I have no good way to test if results in less instance of crash/powerfail recoveries because I personally haven't hit any problems with the default mount options, despite hundreds of intentional force power offs while writing.) For embedded systems, consider using industrial flash. They are slower but more reliable, especially in the case of a power cut. SD Cards are notorious for corruption and going permanently read-only when power is cut; but I've had this problem with USB sticks too. -- Chris Murphy
