Re: BTRFS errors, and won't mount

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2019/10/4 下午3:41, Patrick Dijkgraaf wrote:
> Hi Qu,
> 
> I know about RAID5/6 risks, so I won't blame anyone but myself. I'm
> currenlty working on another solution, but I was not quite there yet...
> 
> mount -o ro /dev/sdh2 /mnt/data gives me:
> 
> [Fri Oct  4 09:36:27 2019] BTRFS info (device sde2): disk space caching
> is enabled
> [Fri Oct  4 09:36:27 2019] BTRFS info (device sde2): has skinny extents
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): parent transid
> verify failed on 5483020828672 wanted 470169 found 470108
> [Fri Oct  4 09:36:27 2019] btree_readpage_end_io_hook: 5 callbacks
> suppressed
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286352011705795888 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286318771218040112 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286363934109025584 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286229742125204784 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286353230849918256 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286246155688035632 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286321695890425136 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286384677254874416 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286386365024912688 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): bad tree block
> start 2286284400752608560 5483020828672
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): failed to recover
> balance: -5
> [Fri Oct  4 09:36:27 2019] BTRFS error (device sde2): open_ctree failed
> 
> Do you think there is any chance to recover?

This means it's the tailing part of root tree get corrupted.

You can comment out the btrfs_recover_balance() call in open_ctree() of
fs/btrfs/disk-io.c, then try mount RO again.

This means some of your subvolumes can't be read out.


Another way to salvage is try using backup roots.

You can get all backup roots bytenr by "btrfs ins dump-super -f".
E.g:

$ btrfs ins dump-super -f /dev/nvme/btrfs | grep backup_tree_root
                backup_tree_root:       5259264 gen: 5  level: 0
                backup_tree_root:       24641536        gen: 6  level: 0
                backup_tree_root:       26378240        gen: 7  level: 0
                backup_tree_root:       5341184 gen: 8  level: 0

Then pass the bytenr into "btrfs check --tree-root <bytenr>" to see
which one could process further.

Thanks,
Qu
> 
> Thanks,
> Patrick.
> 
> 
> On Fri, 2019-10-04 at 15:22 +0800, Qu Wenruo wrote:
>> On 2019/10/4 下午2:59, Patrick Dijkgraaf wrote:
>>> Hi guys,
>>>
>>> During the night, I started getting the following errors and data
>>> was
>>> no longer accessible:
>>>
>>> [Fri Oct  4 08:04:26 2019] btree_readpage_end_io_hook: 2522
>>> callbacks
>>> suppressed
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 17686343003259060482 7808404996096
>>
>> Tree block at address 7808404996096 is completely broken.
>>
>> All the other messages with 7808404996096 shows btrfs is trying all
>> possible device combinations to rebuild that tree block, but
>> obviously
>> all failed.
>>
>> Not sure why the tree block is corrupted, but it's pretty possible
>> that
>> RAID5/6 write hole ruined your possibility to recover.
>>
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 254095834002432 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2574563607252646368 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 17873260189421384017 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 9965805624054187110 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 15108378087789580224 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 7914705769619568652 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 16752645757091223687 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 9617669583708276649 7808404996096
>>> [Fri Oct  4 08:04:26 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 3384408928046898608 7808404996096
>>
>> [...]
>>> Decided to reboot (for another reason) and tried to mount
>>> afterwards:
>>>
>>> [Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): disk space
>>> caching
>>> is enabled
>>> [Fri Oct  4 08:29:42 2019] BTRFS info (device sde2): has skinny
>>> extents
>>> [Fri Oct  4 08:29:44 2019] BTRFS error (device sde2): parent
>>> transid
>>> verify failed on 5483020828672 wanted 470169 found 470108
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2286352011705795888 5483020828672
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2286318771218040112 5483020828672
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2286363934109025584 5483020828672
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2286229742125204784 5483020828672
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2286353230849918256 5483020828672
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2286246155688035632 5483020828672
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2286321695890425136 5483020828672
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2286384677254874416 5483020828672
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2286386365024912688 5483020828672
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): bad tree
>>> block
>>> start 2286284400752608560 5483020828672
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): failed to
>>> recover
>>> balance: -5
>>> [Fri Oct  4 08:29:45 2019] BTRFS error (device sde2): open_ctree
>>> failed
>>
>> You're lucky, as the problem is from balance recovery, thus you may
>> have
>> a chance to mount the RO.
>> As your fs can progress to btrfs_recover_relocation(), most essential
>> trees should be OK, thus you have a chance to mount it RO.
>>
>>> The FS info is shown below. It is a RAID6.
>>>
>>> Label: 'data'  uuid: 43472491-7bb3-418c-b476-874a52e8b2b0
>>> 	Total devices 16 FS bytes used 36.73TiB
>>
>> You won't want to salvage data from a near 40T fs...
>>
>>> 	devid    1 size 7.28TiB used 2.66TiB path /dev/sde2
>>> 	devid    2 size 3.64TiB used 2.66TiB path /dev/sdf2
>>> 	devid    3 size 3.64TiB used 2.66TiB path /dev/sdg2
>>> 	devid    4 size 7.28TiB used 2.66TiB path /dev/sdh2
>>> 	devid    5 size 3.64TiB used 2.66TiB path /dev/sdi2
>>> 	devid    6 size 7.28TiB used 2.66TiB path /dev/sdj2
>>> 	devid    7 size 3.64TiB used 2.66TiB path /dev/sdk2
>>> 	devid    8 size 3.64TiB used 2.66TiB path /dev/sdl2
>>> 	devid    9 size 7.28TiB used 2.66TiB path /dev/sdm2
>>> 	devid   10 size 3.64TiB used 2.66TiB path /dev/sdn2
>>> 	devid   11 size 7.28TiB used 2.66TiB path /dev/sdo2
>>> 	devid   12 size 3.64TiB used 2.66TiB path /dev/sdp2
>>> 	devid   13 size 7.28TiB used 2.66TiB path /dev/sdq2
>>> 	devid   14 size 7.28TiB used 2.66TiB path /dev/sdr2
>>> 	devid   15 size 3.64TiB used 2.66TiB path /dev/sds2
>>> 	devid   16 size 3.64TiB used 2.66TiB path /dev/sdt2
>>
>> And you won't want to use RAID6 if you're expecting RAID6 to tolerant
>> 2
>> disks malfunction.
>>
>> As btrfs RAID5/6 has write-hole problem, any unexpected power loss or
>> disk error could reduce the error tolerance step by step, if you're
>> not
>> running scrub regularly.
>>
>>> The initial error refers to sdw, so possibly something happened
>>> that
>>> caused one or more disks in the external cabinet to disappear and
>>> reappear.
>>>
>>> Kernel is 4.18.16-arch1-1-ARCH. Very hesitant to upgrade it,
>>> because
>>> previously I had to downgrade the kernel to get the volume mounted
>>> again.
>>>
>>> Question: I know that running checks on BTRFS can be dangerous,
>>> what
>>> can you recommend me doing to get the volume back online?
>>
>> "btrfs check" is not dangerous at all. In fact it's pretty safe and
>> it's
>> the main tool we use to expose any problem.
>>
>> It's "btrfs check --repair" dangerous, but way less dangerous in
>> recent
>> years. (although in your case, --repair is completely unrelated and
>> won't help at all)
>>
>> "btrfs check" output from latest btrfs-progs would help.
>>
>> Thanks,
>> Qu
> 
> 
> 

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux