Re: corrupt leaf; invalid root item size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2020/6/4 下午5:45, Thorsten Rehm wrote:
> Thank you for you answer.
> I've just updated my system, did a reboot and it's running with a
> 5.6.0-2-amd64 now.
> So, this is how my kern.log looks like, just right after the start:
> 

> 
> There are too many blocks. I just picked three randomly:

Looks like we need more result, especially some result doesn't match at all.

> 
> === Block 33017856 ===
> $ btrfs ins dump-tree -b 33017856 /dev/dm-0
> btrfs-progs v5.6
> leaf 33017856 items 51 free space 17 generation 24749502 owner FS_TREE
> leaf 33017856 flags 0x1(WRITTEN) backref revision 1
> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
...
>         item 31 key (4000670 EXTENT_DATA 1933312) itemoff 2299 itemsize 53
>                 generation 24749502 type 1 (regular)
>                 extent data disk byte 1126502400 nr 4096
>                 extent data offset 0 nr 8192 ram 8192
>                 extent compression 2 (lzo)
>         item 32 key (4000670 EXTENT_DATA 1941504) itemoff 2246 itemsize 53
>                 generation 24749502 type 1 (regular)
>                 extent data disk byte 0 nr 0
>                 extent data offset 1937408 nr 4096 ram 4194304
>                 extent compression 0 (none)
Not root item at all.
At least for this copy, it looks like kernel got one completely bad
copy, then discarded it and found a good copy.

That's very strange, especially when all the other involved ones seems
random and all at slot 32 is not a coincident.


> === Block 44900352  ===
> btrfs ins dump-tree -b 44900352 /dev/dm-0
> btrfs-progs v5.6
> leaf 44900352 items 19 free space 591 generation 24749527 owner FS_TREE
> leaf 44900352 flags 0x1(WRITTEN) backref revision 1

This block doesn't even have slot 32... It only have 19 items, thus slot
0 ~ slot 18.
And its owner, FS_TREE shouldn't have ROOT_ITEM.

> 
> 
> === Block 55352561664 ===
> $ btrfs ins dump-tree -b 55352561664 /dev/dm-0
> btrfs-progs v5.6
> leaf 55352561664 items 33 free space 1095 generation 24749497 owner ROOT_TREE
> leaf 55352561664 flags 0x1(WRITTEN) backref revision 1
> fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685
> chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9
...
>         item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239
>                 generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1
>                 lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none)
>                 drop key (0 UNKNOWN.0 0) level 0

This looks like the offending tree block.
Slot 32, item size 239, which is ROOT_ITEM, but in valid size.

Since you're here, I guess a btrfs check without --repair on the
unmounted fs would help to identify the real damage.

And again, the fs looks very damaged, it's highly recommended to backup
your data asap.

Thanks,
Qu

> --- snap ---
> 
> 
> 
> On Thu, Jun 4, 2020 at 3:31 AM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote:
>>
>>
>>
>> On 2020/6/3 下午9:37, Thorsten Rehm wrote:
>>> Hi,
>>>
>>> I've updated my system (Debian testing) [1] several months ago (~
>>> December) and I noticed a lot of corrupt leaf messages flooding my
>>> kern.log [2]. Furthermore my system had some trouble, e.g.
>>> applications were terminated after some uptime, due to the btrfs
>>> filesystem errors. This was with kernel 5.3.
>>> The last time I tried was with Kernel 5.6.0-1-amd64 and the problem persists.
>>>
>>> I've downgraded my kernel to 4.19.0-8-amd64 from the Debian Stable
>>> release and with this kernel there aren't any corrupt leaf messages
>>> and the problem is gone. IMHO, it must be something coming with kernel
>>> 5.3 (or 5.x).
>>
>> V5.3 introduced a lot of enhanced metadata sanity checks, and they catch
>> such *obviously* wrong metadata.
>>>
>>> My harddisk is a SSD which is responsible for the root partition. I've
>>> encrypted my filesystem with LUKS and just right after I entered my
>>> password at the boot, the first corrupt leaf errors appear.
>>>
>>> An error message looks like this:
>>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
>>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
>>> size, have 239 expect 439
>>
>> Btrfs root items have fixed size. This is already something very bad.
>>
>> Furthermore, the item size is smaller than expected, which means we can
>> easily get garbage. I'm a little surprised that older kernel can even
>> work without crashing the whole kernel.
>>
>> Some extra info could help us to find out how badly the fs is corrupted.
>> # btrfs ins dump-tree -b 35799040 /dev/dm-0
>>
>>>
>>> "root=1", "slot=32", "have 239 expect 439" is always the same at every
>>> error line. Only the block number changes.
>>
>> And dumps for the other block numbers too.
>>
>>>
>>> Interestingly it's the very same as reported to the ML here [3]. I've
>>> contacted the reporter, but he didn't have a solution for me, because
>>> he changed to a different filesystem.
>>>
>>> I've already tried "btrfs scrub" and "btrfs check --readonly /" in
>>> rescue mode, but w/o any errors. I've also checked the S.M.A.R.T.
>>> values of the SSD, which are fine. Furthermore I've tested my RAM, but
>>> again, w/o any errors.
>>
>> This doesn't look like a bit flip, so not RAM problems.
>>
>> Don't have any better advice until we got the dumps, but I'd recommend
>> to backup your data since it's still possible.
>>
>> Thanks,
>> Qu
>>
>>>
>>> So, I have no more ideas what I can do. Could you please help me to
>>> investigate this further? Could it be a bug?
>>>
>>> Thank you very much.
>>>
>>> Best regards,
>>> Thorsten
>>>
>>>
>>>
>>> 1:
>>> $ cat /etc/debian_version
>>> bullseye/sid
>>>
>>> $ uname -a
>>> [no problem with this kernel]
>>> Linux foo 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux
>>>
>>> $ btrfs --version
>>> btrfs-progs v5.6
>>>
>>> $ sudo btrfs fi show
>>> Label: 'slash'  uuid: 65005d0f-f8ea-4f77-8372-eb8b53198685
>>>         Total devices 1 FS bytes used 7.33GiB
>>>         devid    1 size 115.23GiB used 26.08GiB path /dev/mapper/sda5_crypt
>>>
>>> $ btrfs fi df /
>>> Data, single: total=22.01GiB, used=7.16GiB
>>> System, DUP: total=32.00MiB, used=4.00KiB
>>> System, single: total=4.00MiB, used=0.00B
>>> Metadata, DUP: total=2.00GiB, used=168.19MiB
>>> Metadata, single: total=8.00MiB, used=0.00B
>>> GlobalReserve, single: total=25.42MiB, used=0.00B
>>>
>>>
>>> 2:
>>> [several messages per second]
>>> May  7 14:39:34 foo kernel: [  100.162145] BTRFS critical (device
>>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item
>>> size, have 239 expect 439
>>> May  7 14:39:35 foo kernel: [  100.998530] BTRFS critical (device
>>> dm-0): corrupt leaf: root=1 block=35885056 slot=32, invalid root item
>>> size, have 239 expect 439
>>> May  7 14:39:35 foo kernel: [  101.348650] BTRFS critical (device
>>> dm-0): corrupt leaf: root=1 block=35926016 slot=32, invalid root item
>>> size, have 239 expect 439
>>> May  7 14:39:36 foo kernel: [  101.619437] BTRFS critical (device
>>> dm-0): corrupt leaf: root=1 block=35995648 slot=32, invalid root item
>>> size, have 239 expect 439
>>> May  7 14:39:36 foo kernel: [  101.874069] BTRFS critical (device
>>> dm-0): corrupt leaf: root=1 block=36184064 slot=32, invalid root item
>>> size, have 239 expect 439
>>> May  7 14:39:36 foo kernel: [  102.339087] BTRFS critical (device
>>> dm-0): corrupt leaf: root=1 block=36319232 slot=32, invalid root item
>>> size, have 239 expect 439
>>> May  7 14:39:37 foo kernel: [  102.629429] BTRFS critical (device
>>> dm-0): corrupt leaf: root=1 block=36380672 slot=32, invalid root item
>>> size, have 239 expect 439
>>> May  7 14:39:37 foo kernel: [  102.839669] BTRFS critical (device
>>> dm-0): corrupt leaf: root=1 block=36487168 slot=32, invalid root item
>>> size, have 239 expect 439
>>> May  7 14:39:37 foo kernel: [  103.109183] BTRFS critical (device
>>> dm-0): corrupt leaf: root=1 block=36597760 slot=32, invalid root item
>>> size, have 239 expect 439
>>> May  7 14:39:37 foo kernel: [  103.299101] BTRFS critical (device
>>> dm-0): corrupt leaf: root=1 block=36626432 slot=32, invalid root item
>>> size, have 239 expect 439
>>>
>>> 3:
>>> https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@xxxxxx/
>>>
>>

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux