On 2020/6/4 下午5:45, Thorsten Rehm wrote: > Thank you for you answer. > I've just updated my system, did a reboot and it's running with a > 5.6.0-2-amd64 now. > So, this is how my kern.log looks like, just right after the start: > > > There are too many blocks. I just picked three randomly: Looks like we need more result, especially some result doesn't match at all. > > === Block 33017856 === > $ btrfs ins dump-tree -b 33017856 /dev/dm-0 > btrfs-progs v5.6 > leaf 33017856 items 51 free space 17 generation 24749502 owner FS_TREE > leaf 33017856 flags 0x1(WRITTEN) backref revision 1 > fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685 > chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9 ... > item 31 key (4000670 EXTENT_DATA 1933312) itemoff 2299 itemsize 53 > generation 24749502 type 1 (regular) > extent data disk byte 1126502400 nr 4096 > extent data offset 0 nr 8192 ram 8192 > extent compression 2 (lzo) > item 32 key (4000670 EXTENT_DATA 1941504) itemoff 2246 itemsize 53 > generation 24749502 type 1 (regular) > extent data disk byte 0 nr 0 > extent data offset 1937408 nr 4096 ram 4194304 > extent compression 0 (none) Not root item at all. At least for this copy, it looks like kernel got one completely bad copy, then discarded it and found a good copy. That's very strange, especially when all the other involved ones seems random and all at slot 32 is not a coincident. > === Block 44900352 === > btrfs ins dump-tree -b 44900352 /dev/dm-0 > btrfs-progs v5.6 > leaf 44900352 items 19 free space 591 generation 24749527 owner FS_TREE > leaf 44900352 flags 0x1(WRITTEN) backref revision 1 This block doesn't even have slot 32... It only have 19 items, thus slot 0 ~ slot 18. And its owner, FS_TREE shouldn't have ROOT_ITEM. > > > === Block 55352561664 === > $ btrfs ins dump-tree -b 55352561664 /dev/dm-0 > btrfs-progs v5.6 > leaf 55352561664 items 33 free space 1095 generation 24749497 owner ROOT_TREE > leaf 55352561664 flags 0x1(WRITTEN) backref revision 1 > fs uuid 65005d0f-f8ea-4f77-8372-eb8b53198685 > chunk uuid 137764f6-c8e6-45b3-b275-82d8558c1ff9 ... > item 32 key (DATA_RELOC_TREE ROOT_ITEM 0) itemoff 1920 itemsize 239 > generation 4 root_dirid 256 bytenr 29380608 level 0 refs 1 > lastsnap 0 byte_limit 0 bytes_used 4096 flags 0x0(none) > drop key (0 UNKNOWN.0 0) level 0 This looks like the offending tree block. Slot 32, item size 239, which is ROOT_ITEM, but in valid size. Since you're here, I guess a btrfs check without --repair on the unmounted fs would help to identify the real damage. And again, the fs looks very damaged, it's highly recommended to backup your data asap. Thanks, Qu > --- snap --- > > > > On Thu, Jun 4, 2020 at 3:31 AM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote: >> >> >> >> On 2020/6/3 下午9:37, Thorsten Rehm wrote: >>> Hi, >>> >>> I've updated my system (Debian testing) [1] several months ago (~ >>> December) and I noticed a lot of corrupt leaf messages flooding my >>> kern.log [2]. Furthermore my system had some trouble, e.g. >>> applications were terminated after some uptime, due to the btrfs >>> filesystem errors. This was with kernel 5.3. >>> The last time I tried was with Kernel 5.6.0-1-amd64 and the problem persists. >>> >>> I've downgraded my kernel to 4.19.0-8-amd64 from the Debian Stable >>> release and with this kernel there aren't any corrupt leaf messages >>> and the problem is gone. IMHO, it must be something coming with kernel >>> 5.3 (or 5.x). >> >> V5.3 introduced a lot of enhanced metadata sanity checks, and they catch >> such *obviously* wrong metadata. >>> >>> My harddisk is a SSD which is responsible for the root partition. I've >>> encrypted my filesystem with LUKS and just right after I entered my >>> password at the boot, the first corrupt leaf errors appear. >>> >>> An error message looks like this: >>> May 7 14:39:34 foo kernel: [ 100.162145] BTRFS critical (device >>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item >>> size, have 239 expect 439 >> >> Btrfs root items have fixed size. This is already something very bad. >> >> Furthermore, the item size is smaller than expected, which means we can >> easily get garbage. I'm a little surprised that older kernel can even >> work without crashing the whole kernel. >> >> Some extra info could help us to find out how badly the fs is corrupted. >> # btrfs ins dump-tree -b 35799040 /dev/dm-0 >> >>> >>> "root=1", "slot=32", "have 239 expect 439" is always the same at every >>> error line. Only the block number changes. >> >> And dumps for the other block numbers too. >> >>> >>> Interestingly it's the very same as reported to the ML here [3]. I've >>> contacted the reporter, but he didn't have a solution for me, because >>> he changed to a different filesystem. >>> >>> I've already tried "btrfs scrub" and "btrfs check --readonly /" in >>> rescue mode, but w/o any errors. I've also checked the S.M.A.R.T. >>> values of the SSD, which are fine. Furthermore I've tested my RAM, but >>> again, w/o any errors. >> >> This doesn't look like a bit flip, so not RAM problems. >> >> Don't have any better advice until we got the dumps, but I'd recommend >> to backup your data since it's still possible. >> >> Thanks, >> Qu >> >>> >>> So, I have no more ideas what I can do. Could you please help me to >>> investigate this further? Could it be a bug? >>> >>> Thank you very much. >>> >>> Best regards, >>> Thorsten >>> >>> >>> >>> 1: >>> $ cat /etc/debian_version >>> bullseye/sid >>> >>> $ uname -a >>> [no problem with this kernel] >>> Linux foo 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux >>> >>> $ btrfs --version >>> btrfs-progs v5.6 >>> >>> $ sudo btrfs fi show >>> Label: 'slash' uuid: 65005d0f-f8ea-4f77-8372-eb8b53198685 >>> Total devices 1 FS bytes used 7.33GiB >>> devid 1 size 115.23GiB used 26.08GiB path /dev/mapper/sda5_crypt >>> >>> $ btrfs fi df / >>> Data, single: total=22.01GiB, used=7.16GiB >>> System, DUP: total=32.00MiB, used=4.00KiB >>> System, single: total=4.00MiB, used=0.00B >>> Metadata, DUP: total=2.00GiB, used=168.19MiB >>> Metadata, single: total=8.00MiB, used=0.00B >>> GlobalReserve, single: total=25.42MiB, used=0.00B >>> >>> >>> 2: >>> [several messages per second] >>> May 7 14:39:34 foo kernel: [ 100.162145] BTRFS critical (device >>> dm-0): corrupt leaf: root=1 block=35799040 slot=32, invalid root item >>> size, have 239 expect 439 >>> May 7 14:39:35 foo kernel: [ 100.998530] BTRFS critical (device >>> dm-0): corrupt leaf: root=1 block=35885056 slot=32, invalid root item >>> size, have 239 expect 439 >>> May 7 14:39:35 foo kernel: [ 101.348650] BTRFS critical (device >>> dm-0): corrupt leaf: root=1 block=35926016 slot=32, invalid root item >>> size, have 239 expect 439 >>> May 7 14:39:36 foo kernel: [ 101.619437] BTRFS critical (device >>> dm-0): corrupt leaf: root=1 block=35995648 slot=32, invalid root item >>> size, have 239 expect 439 >>> May 7 14:39:36 foo kernel: [ 101.874069] BTRFS critical (device >>> dm-0): corrupt leaf: root=1 block=36184064 slot=32, invalid root item >>> size, have 239 expect 439 >>> May 7 14:39:36 foo kernel: [ 102.339087] BTRFS critical (device >>> dm-0): corrupt leaf: root=1 block=36319232 slot=32, invalid root item >>> size, have 239 expect 439 >>> May 7 14:39:37 foo kernel: [ 102.629429] BTRFS critical (device >>> dm-0): corrupt leaf: root=1 block=36380672 slot=32, invalid root item >>> size, have 239 expect 439 >>> May 7 14:39:37 foo kernel: [ 102.839669] BTRFS critical (device >>> dm-0): corrupt leaf: root=1 block=36487168 slot=32, invalid root item >>> size, have 239 expect 439 >>> May 7 14:39:37 foo kernel: [ 103.109183] BTRFS critical (device >>> dm-0): corrupt leaf: root=1 block=36597760 slot=32, invalid root item >>> size, have 239 expect 439 >>> May 7 14:39:37 foo kernel: [ 103.299101] BTRFS critical (device >>> dm-0): corrupt leaf: root=1 block=36626432 slot=32, invalid root item >>> size, have 239 expect 439 >>> >>> 3: >>> https://lore.kernel.org/linux-btrfs/19acbd39-475f-bd72-e280-5f6c6496035c@xxxxxx/ >>> >>
Attachment:
signature.asc
Description: OpenPGP digital signature
