Thanks for all the help! If I get a chance later today, I may try the patch set, but in the interest of getting things back online quicker, I may just have to recreate and restore the recovered data. The snapshots are no great loss - they're just one level of daily backups. tw On 08/15/2019 22:45 +0800, Qu Wenruo wrote: >> >> >> On 2019/8/15 ??????10:21, Tim Walberg wrote: >> > 'dump-super -Ffa' from all three devices attached. >> > >> > 'btrfs restore' did appear to recover most of the main data, minus >> > snapshots, which would have greatly increased the required time and >> > capacity, since I was recovering to XFS. >> >> That's why I recommend that experimental patchset, it will make the fs >> mountable (RO though), with all btrfs snapshots available. >> >> > >> > 'btrfs rescue chunk-recover' ran, but failed to fix anything. >> > 'btrfs rescue super-recover' says all supers are fine. >> >> Those are useless for your case. >> >> > >> > Initial corruption was due to a hard hang, which didn't leave enough >> > crumbs to determine the source - might have been btrfs, might have >> > been nvidia, might have been something completely different. >> >> Anyway, the corruption is a little strange. >> >> First of all, even hard hang/power loss shouldn't cause btrfs to >> overwrite its tree block, thus even hard hang/power loss happens, btrfs >> should be corrupted. >> >> But that's definitely not the case. (We have quite some such report, but >> haven't pinned down the cause yet) >> >> Secondly, the generation of your fs is strange. >> The latest geneartion of your tree root is 49750, matches with your >> corrupted tree block, but your extent tree is definitely older. >> >> So it looks like, your super blocks (all nine!) reach disk before some >> tree blocks reach the disk. >> >> Finally, the superblock doesn't record previous transaction correctly. >> It doesn't has transaction of 49749 in its backup roots. >> >> Not 100% sure, but looks somewhat like the problem fixed by this patch: >> Btrfs: fix race leading to fs corruption after transaction abortion >> >> It should get backported to all stable release recently. >> >> Thanks, >> Qu >> >> > >> > >> > On 08/15/2019 22:07 +0800, Qu Wenruo wrote: >> >>> >> >>> >> >>> On 2019/8/15 ??????9:52, Tim Walberg wrote: >> >>> > Had to wait for 'btrfs recover' to finish before I proceed farther. >> >>> > >> >>> > Kernel is 4.19.45, tools are 4.19.1 >> >>> > >> >>> > File system is a 3-disk RAID10 with WD3003FZEX (WD Black 3TB) >> >>> > >> >>> > Output from attempting to mount: >> >>> > >> >>> > # mount -o ro,usebackuproot /dev/sdc1 /mnt >> >>> > mount: wrong fs type, bad option, bad superblock on /dev/sdc1, >> >>> > missing codepage or helper program, or other error >> >>> > >> >>> > In some cases useful info is found in syslog - try >> >>> > dmesg | tail or so. >> >>> > >> >>> > Kernel messages from the mount attempt: >> >>> > >> >>> > [Thu Aug 15 08:47:42 2019] BTRFS info (device sdc1): trying to use backup root at mount time >> >>> > [Thu Aug 15 08:47:42 2019] BTRFS info (device sdc1): disk space caching is enabled >> >>> > [Thu Aug 15 08:47:42 2019] BTRFS info (device sdc1): has skinny extents >> >>> > [Thu Aug 15 08:47:42 2019] BTRFS error (device sdc1): parent transid verify failed on 229846466560 wanted 49749 found 49750 >> >>> > [Thu Aug 15 08:47:42 2019] BTRFS error (device sdc1): parent transid verify failed on 229846466560 wanted 49749 found 49750 >> >>> > [Thu Aug 15 08:47:42 2019] BTRFS error (device sdc1): failed to read block groups: -5 >> >>> >> >>> Extent tree corruption. >> >>> >> >>> So if that's the only corruption, you have a very high chance to recover >> >>> most of your data. >> >>> >> >>> Btrfs rescue can work, or you can try the experimental patches which >> >>> provides rescue=skip_bg mount option to allow you mount the fs RO and >> >>> receive your data (later is way faster than user space rescue) >> >>> https://patchwork.kernel.org/project/linux-btrfs/list/?series=130637 >> >>> >> >>> Also, for your dump super output, it doesn't provide too much info. >> >>> >> >>> You would like to use -Ffa option for more info. >> >>> Also, you could also try that on all 3 devices, to find out which one >> >>> has lower generation. >> >>> >> >>> Also, please provide the history of the corruption. >> >>> One generation corruptions is a little rare. Is sudden power loss >> >>> involved in this case? >> >>> >> >>> Thanks, >> >>> Qu >> >>> >> >>> > [Thu Aug 15 08:47:42 2019] BTRFS error (device sdc1): open_ctree failed >> >>> > >> >>> > Output from 'btrfs check -p /dev/sdc1': >> >>> > >> >>> > # btrfs check -p /dev/sdc1 >> >>> > Opening filesystem to check... >> >>> > parent transid verify failed on 229846466560 wanted 49749 found 49750 >> >>> > Ignoring transid failure >> >>> > ERROR: child eb corrupted: parent bytenr=229845336064 item=0 parent level=1 child level=2 >> >>> > ERROR: cannot open file system >> >>> > >> >>> > >> >>> > >> >>> > On 08/15/2019 10:35 +0800, Qu Wenruo wrote: >> >>> >>> >> >>> >>> >> >>> >>> On 2019/8/15 ??????2:32, Tim Walberg wrote: >> >>> >>> > Most of the recommendations I've found online deal with when "wanted" is >> >>> >>> > greater than "found", which, if I understand correctly means that one or >> >>> >>> > more transactions were interrupted/lost before fully committed. >> >>> >>> >> >>> >>> No matter what the case is, a proper transaction shouldn't have any tree >> >>> >>> block overwritten. >> >>> >>> >> >>> >>> That means, either the FLUSH/FUA of the hardware/lower block layer is >> >>> >>> screwed up, or the COW of tree block is already screwed up. >> >>> >>> >> >>> >>> > >> >>> >>> > Are the recommendations for recovery the same if the system is reporting a >> >>> >>> > "wanted" that is less than "found"? >> >>> >>> > >> >>> >>> The salvage is no difference than any transid mismatch, no matter if >> >>> >>> it's larger or smaller. >> >>> >>> >> >>> >>> It depends on the tree block. >> >>> >>> >> >>> >>> Please provide full dmesg output and btrfs check for further advice. >> >>> >>> >> >>> >>> Thanks, >> >>> >>> Qu >> >>> >>> >> >>> > >> >>> > >> >>> > >> >>> > >> >>> >> > >> > >> > >> > End of included message >> > >> > >> > >> End of included message -- +----------------------+ | Tim Walberg | | 830 Carriage Dr. | | Algonquin, IL 60102 | | twalberg@xxxxxxxxxxx | +----------------------+
