On Fri, May 15, 2020 at 12:03 AM Emil Heimpel <broetchenrackete@xxxxxxxxx> wrote: > > > Hi, > > I hope this is the right place to ask for help. I am unable to mount my BTRFS array and wanted to know, if it is possible to recover (some) data from it. Hi, yes it is! > > I have a RAID1-Metadata/RAID5-Data array consisting of 6 drives, 2x8TB, 5TB, 4TB and 2x3TB. It was running fine for the last 3 months. Because I expanded it drive by drive I wanted to do a full balance the other day, when after around 40% completion (ca 1.5 days) I noticed, that one drive was missing from the array (If I remember correctly, it was the 5TB one). I tried to cancel the balance, but even after a few hours it didn't cancel, so I tried to do a reboot. That didn't work either, so I did a hard reset. Probably not the best idea, I know.... The file system should be power fail safe (with some limited data loss), but the hardware can betray everything. Your configuration is better due to raid1 metadata. > > After the reboot all drives appeared again but now I can't mount the array anymore, it gives me the following error in dmesg: > > [ 858.554594] BTRFS info (device sdc1): disk space caching is enabled > [ 858.554596] BTRFS info (device sdc1): has skinny extents > [ 858.556165] BTRFS error (device sdc1): parent transid verify failed on 23219912048640 wanted 116443 found 116484 > [ 858.556516] BTRFS error (device sdc1): parent transid verify failed on 23219912048640 wanted 116443 found 116484 > [ 858.556527] BTRFS error (device sdc1): failed to read chunk root > [ 858.588332] BTRFS error (device sdc1): open_ctree failed Extent tree is damaged, but it's unexpected that a newer transid is found than is wanted. Something happened out of order. Both copies. What do you get for: # btrfs rescue super -v /dev/anydevice # btrfs insp dump-s -fa /dev/anydevice # btrfs insp dump-t -b 30122546839552 /dev/anydevice # mount -o ro,nologreplay,degraded /dev/anydevice > > [bluemond@BlueQ btrfslogs]$ sudo btrfs check /dev/sdd1 For what it's worth, btrfs check does find all member devices, so you only have to run check on any one of them. However, scrub is different, you can run that individually per block device to work around some performance problems with raid56, when running it on the volume's mount point. > And how can I prevent it from happening again? Would using the new multi-parity raid1 for Metadata help? Difficult to know yet what went wrong. Do you have dmesg/journalctl -k for the time period the problem drive began all the way to the forced power off? It might give a hint. Before doing a forced poweroff while writes are happening it might help to disable the write cache on all the drives; or alternatively always disable them. > I'm running arch on an ssd. > [bluemond@BlueQ btrfslogs]$ uname -a > Linux BlueQ 5.6.12-arch1-1 #1 SMP PREEMPT Sun, 10 May 2020 10:43:42 +0000 x86_64 GNU/Linux > > [bluemond@BlueQ btrfslogs]$ btrfs --version > btrfs-progs v5.6 5.6.1 is current but I don't think there's anything in the minor update that applies here. Post that info and maybe a dev will have time to take a look. If it does mount ro,degraded, take the chance to update backups, just in case. Yeah, ~21TB will be really inconvenient to lose. Also, since it's over the weekend, and there's some time, it might be useful to have a btrfs image: btrfs-image -ss -c9 -t4 /dev/anydevice ~/problemvolume.btrfs.bin This file will be roughly 1/2 the size of file system metadata. I guess you could have around 140G of metadata depending on nodesize chosen at mkfs time, and how many small files this filesystem has. Still another option that might make it possible to mount, if above doesn't work; build the kernel with this patch https://patchwork.kernel.org/project/linux-btrfs/list/?series=170715 Mount using -o ro,nologreplay,rescue=skipbg This also doesn't actually fix the problem, it just might make it possible to mount the file system, mainly for updating backups in case it's not possible to fix. -- Chris Murphy
