Re: Need help with potential ~45TB dataloss

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2018/11/30 下午9:53, Patrick Dijkgraaf wrote:
> Hi all,
> 
> I have been a happy BTRFS user for quite some time. But now I'm facing
> a potential ~45TB dataloss... :-(
> I hope someone can help!
> 
> I have Server A and Server B. Both having a 20-devices BTRFS RAID6
> filesystem. Because of known RAID5/6 risks, Server B was a backup of
> Server A.
> After applying updates to server B and reboot, the FS would not mount
> anymore. Because it was "just" a backup. I decided to recreate the FS
> and perform a new backup. Later, I discovered that the FS was not
> broken, but I faced this issue: 
> https://patchwork.kernel.org/patch/10694997/

Sorry for the inconvenience.

I didn't realize the max_chunk_size limit isn't reliable at that timing.

> 
> Anyway, the FS was already recreated, so I needed to do a new backup.
> During the backup (using rsync -vah), Server A (the source) encountered
> an I/O error and my rsync failed. In an attempt to "quick fix" the
> issue, I rebooted Server A after which the FS would not mount anymore.

Did you have any dmesg about that IO error?

And how is the reboot scheduled? Forced power off or normal reboot command?

> 
> I documented what I have tried, below. I have not yet tried anything
> except what is shown, because I am afraid of causing more harm to
> the FS.

Pretty clever, no btrfs check --repair is a pretty good move.

> I hope somebody here can give me advice on how to (hopefully)
> retrieve my data...
> 
> Thanks in advance!
> 
> ==========================================
> 
> [root@cornelis ~]# btrfs fi show
> Label: 'cornelis-btrfs'  uuid: ac643516-670e-40f3-aa4c-f329fc3795fd
> 	Total devices 1 FS bytes used 463.92GiB
> 	devid    1 size 800.00GiB used 493.02GiB path
> /dev/mapper/cornelis-cornelis--btrfs
> 
> Label: 'data'  uuid: 4c66fa8b-8fc6-4bba-9d83-02a2a1d69ad5
> 	Total devices 20 FS bytes used 44.85TiB
> 	devid    1 size 3.64TiB used 3.64TiB path /dev/sdn2
> 	devid    2 size 3.64TiB used 3.64TiB path /dev/sdp2
> 	devid    3 size 3.64TiB used 3.64TiB path /dev/sdu2
> 	devid    4 size 3.64TiB used 3.64TiB path /dev/sdx2
> 	devid    5 size 3.64TiB used 3.64TiB path /dev/sdh2
> 	devid    6 size 3.64TiB used 3.64TiB path /dev/sdg2
> 	devid    7 size 3.64TiB used 3.64TiB path /dev/sdm2
> 	devid    8 size 3.64TiB used 3.64TiB path /dev/sdw2
> 	devid    9 size 3.64TiB used 3.64TiB path /dev/sdj2
> 	devid   10 size 3.64TiB used 3.64TiB path /dev/sdt2
> 	devid   11 size 3.64TiB used 3.64TiB path /dev/sdk2
> 	devid   12 size 3.64TiB used 3.64TiB path /dev/sdq2
> 	devid   13 size 3.64TiB used 3.64TiB path /dev/sds2
> 	devid   14 size 3.64TiB used 3.64TiB path /dev/sdf2
> 	devid   15 size 7.28TiB used 588.80GiB path /dev/sdr2
> 	devid   16 size 7.28TiB used 588.80GiB path /dev/sdo2
> 	devid   17 size 7.28TiB used 588.80GiB path /dev/sdv2
> 	devid   18 size 7.28TiB used 588.80GiB path /dev/sdi2
> 	devid   19 size 7.28TiB used 588.80GiB path /dev/sdl2
> 	devid   20 size 7.28TiB used 588.80GiB path /dev/sde2
> 
> [root@cornelis ~]# mount /dev/sdn2 /mnt/data
> mount: /mnt/data: wrong fs type, bad option, bad superblock on
> /dev/sdn2, missing codepage or helper program, or other error.

What is the dmesg of the mount failure?

And have you tried -o ro,degraded ?

> 
> [root@cornelis ~]# btrfs check /dev/sdn2
> Opening filesystem to check...
> parent transid verify failed on 46451963543552 wanted 114401 found
> 114173
> parent transid verify failed on 46451963543552 wanted 114401 found
> 114173
> checksum verify failed on 46451963543552 found A8F2A769 wanted 4C111ADF
> checksum verify failed on 46451963543552 found 32153BE8 wanted 8B07ABE4
> checksum verify failed on 46451963543552 found 32153BE8 wanted 8B07ABE4
> bad tree block 46451963543552, bytenr mismatch, want=46451963543552,
> have=75208089814272
> Couldn't read tree root

Would you please also paste the output of "btrfs ins dump-super /dev/sdn2" ?

It looks like your tree root (or at least some tree root nodes/leaves
get corrupted)

> ERROR: cannot open file system

And since it's your tree root corrupted, you could also try
"btrfs-find-root <device>" to try to get a good old copy of your tree root.

But I suspect the corruption happens before you noticed, thus the old
tree root may not help much.

Also, the output of "btrfs ins dump-tree -t root <device>" will help.

Thanks,
Qu
> 
> [root@cornelis ~]# btrfs restore /dev/sdn2 /mnt/data/
> parent transid verify failed on 46451963543552 wanted 114401 found
> 114173
> parent transid verify failed on 46451963543552 wanted 114401 found
> 114173
> checksum verify failed on 46451963543552 found A8F2A769 wanted 4C111ADF
> checksum verify failed on 46451963543552 found 32153BE8 wanted 8B07ABE4
> checksum verify failed on 46451963543552 found 32153BE8 wanted 8B07ABE4
> bad tree block 46451963543552, bytenr mismatch, want=46451963543552,
> have=75208089814272
> Couldn't read tree root
> Could not open root, trying backup super
> warning, device 14 is missing
> warning, device 13 is missing
> warning, device 12 is missing
> warning, device 11 is missing
> warning, device 10 is missing
> warning, device 9 is missing
> warning, device 8 is missing
> warning, device 7 is missing
> warning, device 6 is missing
> warning, device 5 is missing
> warning, device 4 is missing
> warning, device 3 is missing
> warning, device 2 is missing
> checksum verify failed on 22085632 found 5630EA32 wanted 1AA6FFF0
> checksum verify failed on 22085632 found 5630EA32 wanted 1AA6FFF0
> bad tree block 22085632, bytenr mismatch, want=22085632,
> have=1147797504
> ERROR: cannot read chunk root
> Could not open root, trying backup super
> warning, device 14 is missing
> warning, device 13 is missing
> warning, device 12 is missing
> warning, device 11 is missing
> warning, device 10 is missing
> warning, device 9 is missing
> warning, device 8 is missing
> warning, device 7 is missing
> warning, device 6 is missing
> warning, device 5 is missing
> warning, device 4 is missing
> warning, device 3 is missing
> warning, device 2 is missing
> checksum verify failed on 22085632 found 5630EA32 wanted 1AA6FFF0
> checksum verify failed on 22085632 found 5630EA32 wanted 1AA6FFF0
> bad tree block 22085632, bytenr mismatch, want=22085632,
> have=1147797504
> ERROR: cannot read chunk root
> Could not open root, trying backup super
> 
> [root@cornelis ~]# uname -r
> 4.18.16-arch1-1-ARCH
> 
> [root@cornelis ~]# btrfs --version
> btrfs-progs v4.19
> 

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux