On 2020/1/16 上午11:06, Sabrina Cathey wrote: > Up front required information > > uname -a;btrfs --version;btrfs fi show;btrfs fi df /shizzle/ > Linux babel.thegnomedev.com 5.3.8-arch1-1 #1 SMP PREEMPT @1572357769 > x86_64 GNU/Linux > btrfs-progs v5.3.1 > Label: 'shizzle' uuid: 92b267f2-c8af-40eb-b433-e53e140ebd01 > Total devices 10 FS bytes used 34.18TiB > devid 2 size 5.46TiB used 4.28TiB path /dev/sdb1 > devid 3 size 5.46TiB used 4.28TiB path /dev/sdg1 > devid 4 size 5.46TiB used 4.28TiB path /dev/sdh1 > devid 5 size 5.46TiB used 4.28TiB path /dev/sdi1 > devid 6 size 5.46TiB used 4.28TiB path /dev/sdj1 > devid 7 size 5.46TiB used 4.28TiB path /dev/sdf1 > devid 8 size 5.46TiB used 4.28TiB path /dev/sda1 > devid 9 size 5.46TiB used 4.28TiB path /dev/sdd1 > devid 10 size 5.46TiB used 4.28TiB path /dev/sde1 > devid 11 size 5.46TiB used 4.28TiB path /dev/sdc1 RAID6 with btrfs is going to be messy when anything doesn't go correct. Btrfs needs to do 10C2 to assemble a proper mirror, not to mention the infamous write hole problem. And unfortunately, btrfs-progs doesn't have the full RAID6 recovery (It can handle two missing devices, but not trying all possible combinations). This means, we lost the most powerful tool to locate the problem. So it's pretty hard to ping down the root cause. > > Data, RAID6: total=34.18TiB, used=34.13TiB > System, RAID6: total=256.00MiB, used=1.73MiB > Metadata, RAID6: total=60.00GiB, used=54.65GiB > GlobalReserve, single: total=512.00MiB, used=0.00B > > ---- > > dmesg output is over 100k and my understanding is that you have a size > limit so here is a pastebin: https://pastebin.com/d4BPRS6m > > ---- > > Story is that I found the server unresponsive and when I rebooted I > ended seeing a disk was missing https://i.imgur.com/iLgnNBM.jpg > > I mucked about trying to figure out what to do. I ended up rebooting > to see if I could see an issue in the drive controller BIOS and when I > got back into the OS things seemed okay at first. It was mounted and > looked okay but then I noticed issues in dmesg related "parent transid > verify failed" errors. Normally this means some is definitely not correct. > > It's late and I was grasping a straws and random googling. I tried a > scrub and it failed and the filesystem went RO. I retried a few > times, because insanity. > > I tried btrfsck (default non-destructive) and it also bailed out > https://i.imgur.com/ZEq0RjU.jpg As mentioned, btrfs-progs can't help much for RAID6, especially for so many devices. Just to be clear, it's not recommended to use RAID5/6 for metadata or for many devices. > > Looking at btrfs device stats it looks like one of the devices > (/dev/sde) is bad - probably the one that was found missing initially. > I'm attaching the output of that command. I'm way out of my depth > here - my thought is to use btrfs device delete /dev/sde1 If you're pretty sure which device is the culprit, then you can try just remove the device gracefully first. (power down the system, unplug that device). To be extra safe, never mount the fs RW after that. Then try mount the fs *RO*, then try scrub again. This would at least reduce the complexity of rebuilding a good stripe. After that, you can also try btrfs check to see if the result changes. But considering it's transid problem, it's recommended to salvage your data asap. To repeat, don't use RAID5/6 for metadata, or for many devices. Thanks, Qu > > Please can you help me to not lose my data? With this large an amount > of data, I have yet to invest in another set of disks for backup (I > know that RAID isn't backups and I should have them). > > Any help would be most appreciated > > Thanks > > Sabrina >
Attachment:
signature.asc
Description: OpenPGP digital signature
