Re: recovering from "parent transid verify failed"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for all the help!

If I get a chance later today, I may try the patch set, but in
the interest of getting things back online quicker, I may just
have to recreate and restore the recovered data. The snapshots
are no great loss - they're just one level of daily backups.

			tw



On 08/15/2019 22:45 +0800, Qu Wenruo wrote:
>>	
>>	
>>	On 2019/8/15 ??????10:21, Tim Walberg wrote:
>>	> 'dump-super -Ffa' from all three devices attached.
>>	> 
>>	> 'btrfs restore' did appear to recover most of the main data, minus
>>	> snapshots, which would have greatly increased the required time and
>>	> capacity, since I was recovering to XFS.
>>	
>>	That's why I recommend that experimental patchset, it will make the fs
>>	mountable (RO though), with all btrfs snapshots available.
>>	
>>	> 
>>	> 'btrfs rescue chunk-recover' ran, but failed to fix anything.
>>	> 'btrfs rescue super-recover' says all supers are fine.
>>	
>>	Those are useless for your case.
>>	
>>	> 
>>	> Initial corruption was due to a hard hang, which didn't leave enough
>>	> crumbs to determine the source - might have been btrfs, might have
>>	> been nvidia, might have been something completely different.
>>	
>>	Anyway, the corruption is a little strange.
>>	
>>	First of all, even hard hang/power loss shouldn't cause btrfs to
>>	overwrite its tree block, thus even hard hang/power loss happens, btrfs
>>	should be corrupted.
>>	
>>	But that's definitely not the case. (We have quite some such report, but
>>	haven't pinned down the cause yet)
>>	
>>	Secondly, the generation of your fs is strange.
>>	The latest geneartion of your tree root is 49750, matches with your
>>	corrupted tree block, but your extent tree is definitely older.
>>	
>>	So it looks like, your super blocks (all nine!) reach disk before some
>>	tree blocks reach the disk.
>>	
>>	Finally, the superblock doesn't record previous transaction correctly.
>>	It doesn't has transaction of 49749 in its backup roots.
>>	
>>	Not 100% sure, but looks somewhat like the problem fixed by this patch:
>>	Btrfs: fix race leading to fs corruption after transaction abortion
>>	
>>	It should get backported to all stable release recently.
>>	
>>	Thanks,
>>	Qu
>>	
>>	> 
>>	> 
>>	> On 08/15/2019 22:07 +0800, Qu Wenruo wrote:
>>	>>> 	
>>	>>> 	
>>	>>> 	On 2019/8/15 ??????9:52, Tim Walberg wrote:
>>	>>> 	> Had to wait for 'btrfs recover' to finish before I proceed farther.
>>	>>> 	> 
>>	>>> 	> Kernel is 4.19.45, tools are 4.19.1
>>	>>> 	> 
>>	>>> 	> File system is a 3-disk RAID10 with WD3003FZEX (WD Black 3TB)
>>	>>> 	> 
>>	>>> 	> Output from attempting to mount:
>>	>>> 	> 
>>	>>> 	> # mount -o ro,usebackuproot /dev/sdc1 /mnt
>>	>>> 	> mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
>>	>>> 	>        missing codepage or helper program, or other error
>>	>>> 	> 
>>	>>> 	>        In some cases useful info is found in syslog - try
>>	>>> 	>        dmesg | tail or so.
>>	>>> 	> 
>>	>>> 	> Kernel messages from the mount attempt:
>>	>>> 	> 
>>	>>> 	> [Thu Aug 15 08:47:42 2019] BTRFS info (device sdc1): trying to use backup root at mount time
>>	>>> 	> [Thu Aug 15 08:47:42 2019] BTRFS info (device sdc1): disk space caching is enabled
>>	>>> 	> [Thu Aug 15 08:47:42 2019] BTRFS info (device sdc1): has skinny extents
>>	>>> 	> [Thu Aug 15 08:47:42 2019] BTRFS error (device sdc1): parent transid verify failed on 229846466560 wanted 49749 found 49750
>>	>>> 	> [Thu Aug 15 08:47:42 2019] BTRFS error (device sdc1): parent transid verify failed on 229846466560 wanted 49749 found 49750
>>	>>> 	> [Thu Aug 15 08:47:42 2019] BTRFS error (device sdc1): failed to read block groups: -5
>>	>>> 	
>>	>>> 	Extent tree corruption.
>>	>>> 	
>>	>>> 	So if that's the only corruption, you have a very high chance to recover
>>	>>> 	most of your data.
>>	>>> 	
>>	>>> 	Btrfs rescue can work, or you can try the experimental patches which
>>	>>> 	provides rescue=skip_bg mount option to allow you mount the fs RO and
>>	>>> 	receive your data (later is way faster than user space rescue)
>>	>>> 	https://patchwork.kernel.org/project/linux-btrfs/list/?series=130637
>>	>>> 	
>>	>>> 	Also, for your dump super output, it doesn't provide too much info.
>>	>>> 	
>>	>>> 	You would like to use -Ffa option for more info.
>>	>>> 	Also, you could also try that on all 3 devices, to find out which one
>>	>>> 	has lower generation.
>>	>>> 	
>>	>>> 	Also, please provide the history of the corruption.
>>	>>> 	One generation corruptions is a little rare. Is sudden power loss
>>	>>> 	involved in this case?
>>	>>> 	
>>	>>> 	Thanks,
>>	>>> 	Qu
>>	>>> 	
>>	>>> 	> [Thu Aug 15 08:47:42 2019] BTRFS error (device sdc1): open_ctree failed
>>	>>> 	> 
>>	>>> 	> Output from 'btrfs check -p /dev/sdc1':
>>	>>> 	> 
>>	>>> 	> # btrfs check -p /dev/sdc1
>>	>>> 	> Opening filesystem to check...
>>	>>> 	> parent transid verify failed on 229846466560 wanted 49749 found 49750
>>	>>> 	> Ignoring transid failure
>>	>>> 	> ERROR: child eb corrupted: parent bytenr=229845336064 item=0 parent level=1 child level=2
>>	>>> 	> ERROR: cannot open file system
>>	>>> 	> 
>>	>>> 	> 
>>	>>> 	> 
>>	>>> 	> On 08/15/2019 10:35 +0800, Qu Wenruo wrote:
>>	>>> 	>>> 	
>>	>>> 	>>> 	
>>	>>> 	>>> 	On 2019/8/15 ??????2:32, Tim Walberg wrote:
>>	>>> 	>>> 	> Most of the recommendations I've found online deal with when "wanted" is
>>	>>> 	>>> 	> greater than "found", which, if I understand correctly means that one or
>>	>>> 	>>> 	> more transactions were interrupted/lost before fully committed.
>>	>>> 	>>> 	
>>	>>> 	>>> 	No matter what the case is, a proper transaction shouldn't have any tree
>>	>>> 	>>> 	block overwritten.
>>	>>> 	>>> 	
>>	>>> 	>>> 	That means, either the FLUSH/FUA of the hardware/lower block layer is
>>	>>> 	>>> 	screwed up, or the COW of tree block is already screwed up.
>>	>>> 	>>> 	
>>	>>> 	>>> 	> 
>>	>>> 	>>> 	> Are the recommendations for recovery the same if the system is reporting a
>>	>>> 	>>> 	> "wanted" that is less than "found"?
>>	>>> 	>>> 	> 
>>	>>> 	>>> 	The salvage is no difference than any transid mismatch, no matter if
>>	>>> 	>>> 	it's larger or smaller.
>>	>>> 	>>> 	
>>	>>> 	>>> 	It depends on the tree block.
>>	>>> 	>>> 	
>>	>>> 	>>> 	Please provide full dmesg output and btrfs check for further advice.
>>	>>> 	>>> 	
>>	>>> 	>>> 	Thanks,
>>	>>> 	>>> 	Qu
>>	>>> 	>>> 	
>>	>>> 	> 
>>	>>> 	> 
>>	>>> 	> 
>>	>>> 	> 
>>	>>> 	
>>	> 
>>	> 
>>	> 
>>	> End of included message
>>	> 
>>	> 
>>	> 
>>	



End of included message



-- 
+----------------------+
| Tim Walberg          |
| 830 Carriage Dr.     |
| Algonquin, IL 60102  |
| twalberg@xxxxxxxxxxx |
+----------------------+



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux