Am 27.05.2020 um 04:20 schrieb Chris Murphy: > > On Sun, May 24, 2020 at 7:13 PM Justin Engwer <justin@xxxxxxxxxxx> wrote: >> >> Hi, I'm the guy who lost all his VMs due to a massive configuration oversight. >> >> I'm looking to implement the remaining 4 x 3tb drives into a new fs >> and just want someone to look over things. I'm intending to use them >> for backup storage (veeam). >> >> Centos 7 Kernel 5.5.2-1.el7.elrepo.x86_64 >> btrfs-progs v4.9.1 > > I suggest updating the btrfs-progs, that's old. >> >> mkfs.btrfs -m raid1c4 -d raid1 /dev/disk/by-id/ata-ST3000*-part1 >> echo "UUID=whatever /mnt/btrfs/ btrfs defaults,space_cache=v2 0 2" >> /etc/fstab >> mount /mnt/btrfs > > Add noatime. > https://lwn.net/Articles/499293/ > > I don't recommend space_cache=v2 in fstab. Use it once manually with > clear_cache,space_cache=v2, and a feature flag will be set to use it > from that point on. Soon v2 will be the default and you won't have to > worry about this at all. > > fs_passno should be 0 for btrfs. man fsck.btrfs - it's a no op, it's > not designed for unattended use during startup. XFS is the same. > > >> RAID1 over 4 disks and RAID1C4 metadata. Mounting with space_cache=v2. >> Any other mount switches or btrfs creation switches I should be aware >> of? Should I consider RAID5/6 instead? 6tb should be sufficient, so >> it's not like I'd get anything out of RAID5, but RAID6 I suppose could >> provide a little more safety in the case of multiple drive failures at >> once. > > single, dup, raid0, raid1 (all), raid10 are safe and stable. raid56 > has caveats and you need to take precautions that kinda amount to hand > holding. If there is a crash or power fail you need to do a scrub > (full file system scrub) when raid56. It's a good idea, but not "very > necessary" with other profiles. If you mount raid56 degraided, you > seriously need to consider not doing writes or being very skeptical of > depending on those writes because there's some evidence of degraded > writes being corrupted. > > You can check the archives for more information from Zygo, about > raid56 pitfalls. It is table on stable storage. But the point of any > raid is to withstand a non-stable situation like a device failure. And > there's still work needed on raid56 to get to that point, without > handholding. > > If you need raid5, you might consider mdadm for the raid5, and then > format it with btrfs using defaults which will get you DUP metadata > and single copy data. You'll get cheap snapshots. Faster scrubs. And > warnings for any corruptions of metadata or data. Does btrfs allow scrubbing single files? If that's the case, I think offline healing could be possible with btrfs ontop of mdadm RAID5. *If* the corruption is at mdadm level, it would be reported under /sys/block/mdX/md/mismatch_cnt after an mdadm scrub. mdadm doesn't know, which device holds the corrupt data and thus will (afaik) randomly pick one and return it. But btrfs knows, what exactly is corrupt. Assuming you have mdadm with N devices in RAID5 and corrupt file "/somefile.raw", the following could identify the corrupted drive: 1. Unmount the btrfs volume and stop mdadm device. 2. Re-assemble mdadm device as read-only (!), but with only N-1 devices. 3. Mount btrfs as read-only. 4. Let btrfs scrub the corrupt file "/somefile.raw". 5. If btrfs reports no checksum error -> congrats. You can update your backup and recover your data. 6. If btrfs still reports checksum error, repeat the whole process, but pick another group of N-1 devices, which you haven't scanned yet. For further clarification: in an example of 3 devices, you would scrub sda+sdb, sda+sdc and sdb+sdc. This could be a simple bash script. And since btrfs scrub only needs to be done on corrupted files (and not the whole filesystem), the process would be fast. Furthermore, since everything is mounted as read-only, it's worth a try and non-destructive. Please note, I'm not an expert. This could be a very bad idea and trash your data. Just had this idea, some time ago and wanted to discuss it here. The interesting part is, what happens after identifying the corrupt device? If there are no further (mdadm) mismatches, re-adding the "bad" device as new and resyncing might get you up and running again. To be safe, you'd need to mdadm scrub the good pair of N-1 devices, slowing down the whole process. And you're out of options, if mismatches occur. Optimally, mdadm would allow you to flag the underlying block of the corrupted data as bad and resync it from known-good devices (which we have identified above). In that case, even more complex corruption scenarios could be recoverable. But I'm not aware mdadm offers such tools. Either way, I think this is only interesting for data recovery. Any thoughts on this? Cheers, Tolga > > Also consider mkfs.btrfs --checksum=xxhash, but you definitely need > btrfs-progs 5.5 or newer, and kernel 5.6 or newer. If those are too > new for your use case, skip it. crc32c is fine, but it is intended for > detection of casual incidental corruption and can't be used for > dedup. xxhash64 is about as fast, but much better collision > resistance. >
