Hi Qu, I know about RAID5/6 risks, so I won't blame anyone but myself. I'm currenlty working on another solution, but I was not quite there yet... mount -o ro /dev/sdh2 /mnt/data gives me: [Fri Oct 4 09:36:27 2019] BTRFS info (device sde2): disk space caching is enabled [Fri Oct 4 09:36:27 2019] BTRFS info (device sde2): has skinny extents [Fri Oct 4 09:36:27 2019] BTRFS error (device sde2): parent transid verify failed on 5483020828672 wanted 470169 found 470108 [Fri Oct 4 09:36:27 2019] btree_readpage_end_io_hook: 5 callbacks suppressed [Fri Oct 4 09:36:27 2019] BTRFS error (device sde2): bad tree block start 2286352011705795888 5483020828672 [Fri Oct 4 09:36:27 2019] BTRFS error (device sde2): bad tree block start 2286318771218040112 5483020828672 [Fri Oct 4 09:36:27 2019] BTRFS error (device sde2): bad tree block start 2286363934109025584 5483020828672 [Fri Oct 4 09:36:27 2019] BTRFS error (device sde2): bad tree block start 2286229742125204784 5483020828672 [Fri Oct 4 09:36:27 2019] BTRFS error (device sde2): bad tree block start 2286353230849918256 5483020828672 [Fri Oct 4 09:36:27 2019] BTRFS error (device sde2): bad tree block start 2286246155688035632 5483020828672 [Fri Oct 4 09:36:27 2019] BTRFS error (device sde2): bad tree block start 2286321695890425136 5483020828672 [Fri Oct 4 09:36:27 2019] BTRFS error (device sde2): bad tree block start 2286384677254874416 5483020828672 [Fri Oct 4 09:36:27 2019] BTRFS error (device sde2): bad tree block start 2286386365024912688 5483020828672 [Fri Oct 4 09:36:27 2019] BTRFS error (device sde2): bad tree block start 2286284400752608560 5483020828672 [Fri Oct 4 09:36:27 2019] BTRFS error (device sde2): failed to recover balance: -5 [Fri Oct 4 09:36:27 2019] BTRFS error (device sde2): open_ctree failed Do you think there is any chance to recover? Thanks, Patrick. On Fri, 2019-10-04 at 15:22 +0800, Qu Wenruo wrote: > On 2019/10/4 下午2:59, Patrick Dijkgraaf wrote: > > Hi guys, > > > > During the night, I started getting the following errors and data > > was > > no longer accessible: > > > > [Fri Oct 4 08:04:26 2019] btree_readpage_end_io_hook: 2522 > > callbacks > > suppressed > > [Fri Oct 4 08:04:26 2019] BTRFS error (device sde2): bad tree > > block > > start 17686343003259060482 7808404996096 > > Tree block at address 7808404996096 is completely broken. > > All the other messages with 7808404996096 shows btrfs is trying all > possible device combinations to rebuild that tree block, but > obviously > all failed. > > Not sure why the tree block is corrupted, but it's pretty possible > that > RAID5/6 write hole ruined your possibility to recover. > > > [Fri Oct 4 08:04:26 2019] BTRFS error (device sde2): bad tree > > block > > start 254095834002432 7808404996096 > > [Fri Oct 4 08:04:26 2019] BTRFS error (device sde2): bad tree > > block > > start 2574563607252646368 7808404996096 > > [Fri Oct 4 08:04:26 2019] BTRFS error (device sde2): bad tree > > block > > start 17873260189421384017 7808404996096 > > [Fri Oct 4 08:04:26 2019] BTRFS error (device sde2): bad tree > > block > > start 9965805624054187110 7808404996096 > > [Fri Oct 4 08:04:26 2019] BTRFS error (device sde2): bad tree > > block > > start 15108378087789580224 7808404996096 > > [Fri Oct 4 08:04:26 2019] BTRFS error (device sde2): bad tree > > block > > start 7914705769619568652 7808404996096 > > [Fri Oct 4 08:04:26 2019] BTRFS error (device sde2): bad tree > > block > > start 16752645757091223687 7808404996096 > > [Fri Oct 4 08:04:26 2019] BTRFS error (device sde2): bad tree > > block > > start 9617669583708276649 7808404996096 > > [Fri Oct 4 08:04:26 2019] BTRFS error (device sde2): bad tree > > block > > start 3384408928046898608 7808404996096 > > [...] > > Decided to reboot (for another reason) and tried to mount > > afterwards: > > > > [Fri Oct 4 08:29:42 2019] BTRFS info (device sde2): disk space > > caching > > is enabled > > [Fri Oct 4 08:29:42 2019] BTRFS info (device sde2): has skinny > > extents > > [Fri Oct 4 08:29:44 2019] BTRFS error (device sde2): parent > > transid > > verify failed on 5483020828672 wanted 470169 found 470108 > > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): bad tree > > block > > start 2286352011705795888 5483020828672 > > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): bad tree > > block > > start 2286318771218040112 5483020828672 > > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): bad tree > > block > > start 2286363934109025584 5483020828672 > > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): bad tree > > block > > start 2286229742125204784 5483020828672 > > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): bad tree > > block > > start 2286353230849918256 5483020828672 > > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): bad tree > > block > > start 2286246155688035632 5483020828672 > > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): bad tree > > block > > start 2286321695890425136 5483020828672 > > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): bad tree > > block > > start 2286384677254874416 5483020828672 > > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): bad tree > > block > > start 2286386365024912688 5483020828672 > > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): bad tree > > block > > start 2286284400752608560 5483020828672 > > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): failed to > > recover > > balance: -5 > > [Fri Oct 4 08:29:45 2019] BTRFS error (device sde2): open_ctree > > failed > > You're lucky, as the problem is from balance recovery, thus you may > have > a chance to mount the RO. > As your fs can progress to btrfs_recover_relocation(), most essential > trees should be OK, thus you have a chance to mount it RO. > > > The FS info is shown below. It is a RAID6. > > > > Label: 'data' uuid: 43472491-7bb3-418c-b476-874a52e8b2b0 > > Total devices 16 FS bytes used 36.73TiB > > You won't want to salvage data from a near 40T fs... > > > devid 1 size 7.28TiB used 2.66TiB path /dev/sde2 > > devid 2 size 3.64TiB used 2.66TiB path /dev/sdf2 > > devid 3 size 3.64TiB used 2.66TiB path /dev/sdg2 > > devid 4 size 7.28TiB used 2.66TiB path /dev/sdh2 > > devid 5 size 3.64TiB used 2.66TiB path /dev/sdi2 > > devid 6 size 7.28TiB used 2.66TiB path /dev/sdj2 > > devid 7 size 3.64TiB used 2.66TiB path /dev/sdk2 > > devid 8 size 3.64TiB used 2.66TiB path /dev/sdl2 > > devid 9 size 7.28TiB used 2.66TiB path /dev/sdm2 > > devid 10 size 3.64TiB used 2.66TiB path /dev/sdn2 > > devid 11 size 7.28TiB used 2.66TiB path /dev/sdo2 > > devid 12 size 3.64TiB used 2.66TiB path /dev/sdp2 > > devid 13 size 7.28TiB used 2.66TiB path /dev/sdq2 > > devid 14 size 7.28TiB used 2.66TiB path /dev/sdr2 > > devid 15 size 3.64TiB used 2.66TiB path /dev/sds2 > > devid 16 size 3.64TiB used 2.66TiB path /dev/sdt2 > > And you won't want to use RAID6 if you're expecting RAID6 to tolerant > 2 > disks malfunction. > > As btrfs RAID5/6 has write-hole problem, any unexpected power loss or > disk error could reduce the error tolerance step by step, if you're > not > running scrub regularly. > > > The initial error refers to sdw, so possibly something happened > > that > > caused one or more disks in the external cabinet to disappear and > > reappear. > > > > Kernel is 4.18.16-arch1-1-ARCH. Very hesitant to upgrade it, > > because > > previously I had to downgrade the kernel to get the volume mounted > > again. > > > > Question: I know that running checks on BTRFS can be dangerous, > > what > > can you recommend me doing to get the volume back online? > > "btrfs check" is not dangerous at all. In fact it's pretty safe and > it's > the main tool we use to expose any problem. > > It's "btrfs check --repair" dangerous, but way less dangerous in > recent > years. (although in your case, --repair is completely unrelated and > won't help at all) > > "btrfs check" output from latest btrfs-progs would help. > > Thanks, > Qu
