Re: "parent transid verify failed" and mount usebackuproot does not seem to work

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/1/2020 3:48 AM, Qu Wenruo wrote:
> On 2020/7/1 下午6:16, Illia Bobyr wrote:
>> On 6/30/2020 6:36 PM, Qu Wenruo wrote:
>>> On 2020/7/1 上午3:41, Illia Bobyr wrote:
>>>> [...]
>>> Looks like some tree blocks not written back correctly.
>>>
>>> Considering we don't have known write back related bugs with 5.6, I
>>> guess bcache may be involved again?
>> A bit more details: the system started to misbehave.
>> Interactive session was saying that the main file system became read/only.
> Any dmesg of that RO event?
> That would be the most valuable info to help us to locate the bug and
> fix it.
>
> I guess there is something wrong before that, and by somehow it
> corrupted the extent tree, breaking the life keeping COW of metadata and
> screwed up everything.

After I will restore the data, I will check the kernel log to see if
there are any messages in there.
Will post here if I will find anything.

>> [...]
>>> In this case, I guess "btrfs ins dump-super -fFa" output would help to
>>> show if it's possible to recover.
>> Here is the output: https://pastebin.com/raw/DtJd813y
> OK, the backup root is fine.
>
> So this means, metadata COW is corrupted, which caused the transid mismatch.
>
>>> Anyway, something looks strange.
>>>
>>> The backup roots have a newer generation while the super block is still
>>> old doesn't look correct at all.
>> Just in case, here is the output of "btrfs check", as suggested by "A L
>> <mail@xxxxxxxxxxxxxx>".  It does not seem to contain any new information.
>>
>> parent transid verify failed on 16984014372864 wanted 138350 found 131117
>> parent transid verify failed on 16984014405632 wanted 138350 found 131127
>> parent transid verify failed on 16984013406208 wanted 138350 found 131112
>> parent transid verify failed on 16984075436032 wanted 138384 found 131136
>> parent transid verify failed on 16984075436032 wanted 138384 found 131136
>> parent transid verify failed on 16984075436032 wanted 138384 found 131136
>> Ignoring transid failure
>> ERROR: child eb corrupted: parent bytenr=16984175853568 item=8 parent
>> level=2 child level=0
>> ERROR: failed to read block groups: Input/output error
> Extent tree is completely screwed up, no wonder the transid error happens.
>
> I don't believe it's reasonable possible to restore the fs to RW status.
> The only remaining method left is btrfs-restore then.

There are no more available SATA connections in the system and there is
a lot of data in that FS (~7TB).
I do not immediately have another disk that would be able to hold this much.

At the same time this FS is RAID0.
I wonder if there is a way to first check if restore will work should I
will disconnect half of the disks, as each half contains all the data.
And then if it does, I would be able to restore by reusing the space on
of the mirrors.

I see "-D: Dry run" that can be passed to "btrfs restore", but, I guess,
it would not really do a full check of the data, making sure that the
restore would really succeed, does it?

Is there a way to perform this kind of check?
Or is "btrfs restore" the only option at the moment?




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux