Removing Andrei and Qu. On Wed, Mar 27, 2019 at 12:52 PM Chris Murphy <lists@xxxxxxxxxxxxxxxxx> wrote: > > And then also in the meantime, prepare for having to rebuild this array. a. Check with the manufacturer of the hardware raid for firmware updates for all the controllers. Also check if the new version is backward compatible with an array made with the version you have, and if not, if downgrade is possible. That way you have the option of pulling the drives you have, putting them on a shelf, buying new drives, and creating a new array with new hardware raid controller firmware without having to blow away this broken Btrfs file system just yet. b. If you go with Btrfs again, I suggest using metadata raid1. It's speculation whether that would help recovery in this failure case. But it probably wouldn't have made it any worse, and wouldn't meaningfully impact performance. For first mount after mkfs, use mount option 'space_cache=v2' to create the free space tree, it's soon to be the default anyway, and for large file systems it offers improved performance and the same reliability. If you don't care about performance you could just always use `nospace_cache` mount option in addition to `noatime,notreelog` and optionally a compression option like `compress=zstd`. I would not use the nodatacow or nodatasum options. If you're considering those mount options you should just consider using ext4 or XFS at the next go around. c. If it turns out the current Btrfs can be repaired, of course update backups ASAP. But then I'd personally consider the file system still suspect for anything other than short term use, and you'll want to rebuild it from scratch eventually anyway, which lands you back at a.) and b.) above. The most recent ext4 and XFS upstream work enables metadata checksumming so you'd be in the same boat as you were with Btrfs using nodatacow; there are still some older tools that create those file systems without metadata checksumming, so I'd watch out for that. And I'd say it's a coin toss which one to pick; I'm not really sure off hand which one has a greater chance of surviving a hard reset with inflight data. d. Back to the hardware raid6 controller: you should make sure it's really configured per manufacturer's expectations with respect to drive write caching. Something got lost in the hard reset. Should the individual drive write caches be disabled? Possible that the hardware raid vendor expects this, if they're doing controller caching, and they ensure the proper flushing to disk in the order expected by the file system, where individual drive write caches can thwart that ordering. If the controller has a write cache, is it battery backed? If not, does the manufacturer recommend disabling write caching? Something didn't work, and these are just some of the questions to try and find out the optimal settings to avoid this happening in the future, because even with a backup, restoring this much data is a PITA. -- Chris Murphy
