On 2019/10/4 下午2:59, Patrick Dijkgraaf wrote: > Hi guys, > > During the night, I started getting the following errors and data was > no longer accessible: > > [Fri Oct 4 08:04:26 2019] btree_readpage_end_io_hook: 2522 callbacks > suppressed > [Fri Oct 4 08:04:26 2019] BTRFS error (device sde2): bad tree block > start 17686343003259060482 7808404996096 Tree block at address 7808404996096 is completely broken. All the other messages with 7808404996096 shows btrfs is trying all possible device combinations to rebuild that tree block, but obviously all failed. Not sure why the tree block is corrupted, but it's pretty possible that RAID5/6 write hole ruined your possibility to recover. > [Fri Oct 4 08:04:26 2019] BTRFS error (device sde2): bad tree block > start 254095834002432 7808404996096 > [Fri Oct 4 08:04:26 2019] BTRFS error (device sde2): bad tree block > start 2574563607252646368 7808404996096 > [Fri Oct 4 08:04:26 2019] BTRFS error (device sde2): bad tree block > start 17873260189421384017 7808404996096 > [Fri Oct 4 08:04:26 2019] BTRFS error (device sde2): bad tree block > start 9965805624054187110 7808404996096 > [Fri Oct 4 08:04:26 2019] BTRFS error (device sde2): bad tree block > start 15108378087789580224 7808404996096 > [Fri Oct 4 08:04:26 2019] BTRFS error (device sde2): bad tree block > start 7914705769619568652 7808404996096 > [Fri Oct 4 08:04:26 2019] BTRFS error (device sde2): bad tree block > start 16752645757091223687 7808404996096 > [Fri Oct 4 08:04:26 2019] BTRFS error (device sde2): bad tree block > start 9617669583708276649 7808404996096 > [Fri Oct 4 08:04:26 2019] BTRFS error (device sde2): bad tree block > start 3384408928046898608 7808404996096 [...] > Decided to reboot (for another reason) and tried to mount afterwards: > > [Fri Oct 4 08:29:42 2019] BTRFS info (device sde2): disk space caching > is enabled > [Fri Oct 4 08:29:42 2019] BTRFS info (device sde2): has skinny extents > [Fri Oct 4 08:29:44 2019] BTRFS error (device sde2): parent transid > verify failed on 5483020828672 wanted 470169 found 470108 > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): bad tree block > start 2286352011705795888 5483020828672 > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): bad tree block > start 2286318771218040112 5483020828672 > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): bad tree block > start 2286363934109025584 5483020828672 > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): bad tree block > start 2286229742125204784 5483020828672 > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): bad tree block > start 2286353230849918256 5483020828672 > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): bad tree block > start 2286246155688035632 5483020828672 > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): bad tree block > start 2286321695890425136 5483020828672 > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): bad tree block > start 2286384677254874416 5483020828672 > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): bad tree block > start 2286386365024912688 5483020828672 > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): bad tree block > start 2286284400752608560 5483020828672 > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): failed to recover > balance: -5 > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): open_ctree failed You're lucky, as the problem is from balance recovery, thus you may have a chance to mount the RO. As your fs can progress to btrfs_recover_relocation(), most essential trees should be OK, thus you have a chance to mount it RO. > > The FS info is shown below. It is a RAID6. > > Label: 'data' uuid: 43472491-7bb3-418c-b476-874a52e8b2b0 > Total devices 16 FS bytes used 36.73TiB You won't want to salvage data from a near 40T fs... > devid 1 size 7.28TiB used 2.66TiB path /dev/sde2 > devid 2 size 3.64TiB used 2.66TiB path /dev/sdf2 > devid 3 size 3.64TiB used 2.66TiB path /dev/sdg2 > devid 4 size 7.28TiB used 2.66TiB path /dev/sdh2 > devid 5 size 3.64TiB used 2.66TiB path /dev/sdi2 > devid 6 size 7.28TiB used 2.66TiB path /dev/sdj2 > devid 7 size 3.64TiB used 2.66TiB path /dev/sdk2 > devid 8 size 3.64TiB used 2.66TiB path /dev/sdl2 > devid 9 size 7.28TiB used 2.66TiB path /dev/sdm2 > devid 10 size 3.64TiB used 2.66TiB path /dev/sdn2 > devid 11 size 7.28TiB used 2.66TiB path /dev/sdo2 > devid 12 size 3.64TiB used 2.66TiB path /dev/sdp2 > devid 13 size 7.28TiB used 2.66TiB path /dev/sdq2 > devid 14 size 7.28TiB used 2.66TiB path /dev/sdr2 > devid 15 size 3.64TiB used 2.66TiB path /dev/sds2 > devid 16 size 3.64TiB used 2.66TiB path /dev/sdt2 And you won't want to use RAID6 if you're expecting RAID6 to tolerant 2 disks malfunction. As btrfs RAID5/6 has write-hole problem, any unexpected power loss or disk error could reduce the error tolerance step by step, if you're not running scrub regularly. > > The initial error refers to sdw, so possibly something happened that > caused one or more disks in the external cabinet to disappear and > reappear. > > Kernel is 4.18.16-arch1-1-ARCH. Very hesitant to upgrade it, because > previously I had to downgrade the kernel to get the volume mounted > again. > > Question: I know that running checks on BTRFS can be dangerous, what > can you recommend me doing to get the volume back online? "btrfs check" is not dangerous at all. In fact it's pretty safe and it's the main tool we use to expose any problem. It's "btrfs check --repair" dangerous, but way less dangerous in recent years. (although in your case, --repair is completely unrelated and won't help at all) "btrfs check" output from latest btrfs-progs would help. Thanks, Qu >
Attachment:
signature.asc
Description: OpenPGP digital signature
