Re: first it froze, now the (btrfs) root fs won't mount ...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2019/10/21 下午6:47, Christian Pernegger wrote:
> [Please CC me, I'm not on the list.]
> 
> Am So., 20. Okt. 2019 um 12:28 Uhr schrieb Qu Wenruo <quwenruo.btrfs@xxxxxxx>:
>>> Question: Can I work with the mounted backup image on the machine that
>>> also contains the original disc? I vaguely recall something about
>>> btrfs really not liking clones.
>>
>> If your fs only contains one device (single fs on single device), then
>> you should be mostly fine. [...] mostly OK.
> 
> Should? Mostly? What a nightmare-inducing, yet pleasantly Adams-esqe
> way of putting things ... :-)
> 
> Anyway, I have an image of the whole disk on a server now and am
> feeling all the more adventurous for it. (The first try failed a
> couple of MB from completion due to spurious network issues, which is
> why I've taken so long to reply.)
> 
>>> You wouldn't happen to know of a [suitable] bootable rescue image [...]?
>>
>> Archlinux iso at least has the latest btrfs-progs.
> 
> I'm on the Ubuntu 19.10 live CD (btrfs-progs 5.2.1, kernel 5.3.0)
> until further notice. Exploring other options (incl. running your
> rescue kernel on another machine and serving the disk via nbd) in
> parallel.
> 
>> I'd recommend the following safer methods before trying --init-extent-tree:
>>
>> - Dump backup roots first:
>>   # btrfs ins dump-super -f <dev> | grep backup_treee_root
>>   Then grab all big numbers.
> 
> # btrfs inspect-internal dump-super -f /dev/nvme0n1p2 | grep backup_tree_root
> backup_tree_root:    284041969664    gen: 58600    level: 1
> backup_tree_root:    284041953280    gen: 58601    level: 1
> backup_tree_root:    284042706944    gen: 58602    level: 1
> backup_tree_root:    284045410304    gen: 58603    level: 1
> 
>> - Try backup_extent_root numbers in btrfs check first
>>   # btrfs check -r <above big number> <dev>
>>   Use the number with highest generation first.
> 
> Assuming backup_extent_root == backup_tree_root ...
> 
> # btrfs check --tree-root 284045410304 /dev/nvme0n1p2
> Opening filesystem to check...
> checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
> checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
> bad tree block 284041084928, bytenr mismatch, want=284041084928, have=0
> ERROR: cannot open file system
> 
> # btrfs check --tree-root 284042706944 /dev/nvme0n1p2
> Opening filesystem to check...
> checksum verify failed on 284042706944 found E4E3BDB6 wanted 00000000
> checksum verify failed on 284042706944 found E4E3BDB6 wanted 00000000
> bad tree block 284042706944, bytenr mismatch, want=284042706944, have=0
> Couldn't read tree root
> ERROR: cannot open file system
> 
> # btrfs check --tree-root 284041953280 /dev/nvme0n1p2
> Opening filesystem to check...
> checksum verify failed on 284041953280 found E4E3BDB6 wanted 00000000
> checksum verify failed on 284041953280 found E4E3BDB6 wanted 00000000
> bad tree block 284041953280, bytenr mismatch, want=284041953280, have=0
> Couldn't read tree root
> ERROR: cannot open file system
> 
> # btrfs check --tree-root 284041969664 /dev/nvme0n1p2
> Opening filesystem to check...
> checksum verify failed on 284041969664 found E4E3BDB6 wanted 00000000
> checksum verify failed on 284041969664 found E4E3BDB6 wanted 00000000
> bad tree block 284041969664, bytenr mismatch, want=284041969664, have=0
> Couldn't read tree root
> ERROR: cannot open file system

This doesn't look good at all.

All 4 copies are wiped out, so it doesn't look like a bug in btrfs, but
some other problem wiping out a full range of tree blocks.

> 
>>   If all backup fails to pass basic btrfs check, and all happen to have
>>   the same "wanted 00000000" then it means a big range of tree blocks
>>   get wiped out, not really related to btrfs but some hardware wipe.
> 
> Doesn't look good, does it? Any further ideas at all or is this the
> end of the line? TBH, at this point, I don't mind having to re-install
> the box so much as the idea that the same thing might happen again --

I don't have good idea. The result looks like something have wiped part
of your tree blocks (not a single one, but a range).

> either to this one, or to my work machine, which is very similar. If
> nothing else, I'd really appreciate knowing what exactly happened here
> and why -- a bug in the GPU and/or its driver shouldn't cause this --;
> and an avoidance strategy that goes beyond-upgrade-and-pray.

At this stage, I'm sorry that I have no idea at all.

If you're 100% sure that you haven't enabled discard for a while, then I
guess it doesn't look like btrfs at least.
Btrfs shouldn't cause so many tree blocks wiped at all, even for v5.0
kernel.

Thanks,
Qu

> 
> Cheers,
> Christian
> 

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux