01.09.2019 6:28, Sean Greenslade пишет: > > I decided to do a bit of experimentation to test this theory. The > primary goal was to see if a filesystem could suffer a failed disk and > have that disk removed and rebalanced among the remaining disks without > the filesystem losing data or going read-only. Tested on kernel > 5.2.5-arch1-1-ARCH, progs: v5.2.1. > > I was actually quite impressed. When I ripped one of the block devices > out from under btrfs, the kernel started spewing tons of BTRFS errors, > but seemed to keep on trucking. I didn't leave it in this state for too > long, but I was reading, writing, and syncing the fs without issue. > After performing a btrfs device delete <MISSING_DEVID>, the filesystem > rebalanced and stopped reporting errors. How many devices did filesystem have? What profiles did original filesystem use and what profiles were present after deleting device? Just to be sure there was no silent downgrade from raid1 to dup or single as example. > Looks like this may be a viable > strategy for high-availability filesystems assuming you have adequate > monitoring in place to catch the disk failures quickly. I personally > wouldn't want to fully automate the disk deletion, but it's certainly > possible. > This would be valid strategy if we could tell btrfs to reserve enough spare space; but even this is not enough, every allocation btrfs does must be done so that enough spare space remains to reconstruct every other missing chunk. Actually I now ask myself - what happens when btrfs sees unusable disk sector(s) in some chunk? Will it automatically reconstruct content of this chunk somewhere else? If not, what is an option besides full device replacement?
