Re: Fwd: Fw: kernel oops when mounting btrfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2019/3/24 下午6:49, Thorsten Hirsch wrote:
> Hi Qu,
> 
> thank you once again for your advice. I could indeed recover all my
> data, even the snapshots docker had created. Everything's working as
> if nothing had ever happened. Here's what I've did in the end:
> 
> btrfs recover <src> <dest> worked flawless, but only recovered some data.
> mount -o ro,notreelog,nologreplay was the only way to mount the broken
> partition and it showed me a lot more data than btrfs recover could
> recover. However when trying to access these additional files I had
> input/output errors.

This means some csum tree got corrupted.

I have seen several reports about csum and extent tree corruption, so
it's quite possible.

> 
> btrfs recover -sxmS <src> <dest> was the magic command that recovered
> all my data (which I could "cp -a" back to my device after creating a
> new btrfs file system). After reading the help of btrfs-recover it's
> obvious that the arguments are required, but in the btrfs wiki it says
> "If you're really lucky, this might be enough"[1] describing the
> command w/o arguments. I think this is misleading. The arguments are
> always necessary if you want to recover all your data. Well, at least
> I think the wiki page makes mores sense if the arguments were
> included.

After looking into the man page, I strongly believe that file
owner/mode/symlink related things should be the default value.

At least we should enhance either the manpage or btrfs-restore.

Thanks,
Qu

> 
> If there's anything I can provide to help you improve btrfs or its
> recovery tools please don't hesitate to ask. Although I don't have an
> image of the broken partition, at least I still have the core dump of
> "btrfs check --clear-space-cache v1".
> 
> [1] https://btrfs.wiki.kernel.org/index.php/Restore
> 
> Thorsten Hirsch
> 
> P.S.: btrfs check --repair was of no use. It crashed almost
> immediately. I tried it only after recovering all my data, to see if
> it would've helped as well.
> 
> Am Sa., 23. März 2019 um 14:57 Uhr schrieb Qu Wenruo <quwenruo.btrfs@xxxxxxx>:
>>
>>
>>
>> On 2019/3/23 下午6:48, Thorsten Hirsch wrote:
>>> Hi Qu,
>>>
>>> sorry for this direct reply. I've been trying to answer to the mailing
>>> list since yesterday, but my mails seem to get dropped. So please see
>>> my answer to your mail enclosed.
>>>
>>> Thorsten
>>>
>>>
>>> ---------- Forwarded message ---------
>>> From: Thorsten Hirsch <t.hirsch@xxxxxx>
>>> Date: Sa., 23. März 2019 um 09:29 Uhr
>>> Subject: Re: Fw: kernel oops when mounting btrfs
>>> To: <linux-btrfs@xxxxxxxxxxxxxxx>
>>>
>>>
>>> Hi Qu,
>>>
>>> thank you, but unfortunately that didn't work out so well. The tree
>>> dump was no problem [1], but clearing the space cache resulted in a
>>> core dump. Now btrfs check --readonly reports some errors. I attached
>>> the output of these commands.
>>>
>>> Thorsten
>>>
>>> [1] https://gist.github.com/thorstenhirsch/65d4308ce54729c902cb09c0d4ad2baf
>>
>> This explains why a lot of things doesn't go correct.
>>
>> The inode item of your free space cache tree is wrong.
>> According to my experimental with latest kernel, it looks like some
>> older kernel is the culprit.
>>
>> Your free space cache inode lacks the correct mode.
>> Normally the mode should be 0100600. But your fs only has 0, and kernel
>> panics for that reason.
>>
>>>
>>> # btrfs check --clear-space-cache v1 /dev/nvme0n1p3
>>> Opening filesystem to check...
>>> Checking filesystem on /dev/nvme0n1p3
>>> UUID: 4284a794-ad75-450d-b023-ebc5e75f31f5
>>> Failed to find [544448348160, 168, 16384]
>>
>> Then this means something bad happened in extent tree.
>>
>>> btrfs unable to find ref byte nr 544448364544 parent 0 root 2  owner 0 offset 0
>>> transaction.c:195: btrfs_commit_transaction: BUG_ON `ret` triggered, value -5
>>> btrfs(+0x3be68)[0x556936269e68]
>>> btrfs(btrfs_commit_transaction+0x12a)[0x55693626a2ec]
>>> btrfs(btrfs_clear_free_space_cache+0x32a)[0x55693625fecf]
>>> btrfs(+0x4be5b)[0x556936279e5b]
>>> btrfs(cmd_check+0x5c2)[0x556936284d86]
>>> btrfs(main+0x1f6)[0x556936241ef6]
>>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7fb9a7911b6b]
>>> btrfs(_start+0x2a)[0x556936241f3a]
>>> Aborted (core dumped)
>>>
>>>
>>> # btrfs check --readonly /dev/nvme0n1p3
>>> Opening filesystem to check...
>>> parent transid verify failed on 419860414464 wanted 30188 found 30105
>>> parent transid verify failed on 419860414464 wanted 30188 found 30105
>>
>> So extent tree get corrupted in that repair attempt, which looks pretty
>> strange, as aborted transaction shouldn't cause any impact on the
>> existing fs.
>>
>> I'm afraid you can only try btrfs check --repair.
>>
>> If no good result, then I'm afraid you have to go to salvage the data,
>> which I believe over 99% of your data should be safe.
>>
>> To salvage the data, either use btrfs-restore, or you my experimental
>> 'skip_bg' kernel patches:
>> https://github.com/adam900710/linux/tree/rescue_options
>>
>> The 'skip_bg' kernel patches introduce a new mount option,
>> 'ro,rescue=skip_bg', which can skip the whole (corrupted) extent tree,
>> and since you have all trees consistent but extent tree, you have all
>> the readonly btrfs features, like subvolume list, csum check.
>>
>> Thanks,
>> Qu
>>

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux