Re: Corrupted filesystem, looking for guidance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Have same issue (RAID5 over 4 disks):
https://marc.info/?l=linux-btrfs&m=154815802313248&w=2

Having perfectly healthy HDDs it seem to be caused by some bit flips
in SDRAM which is non-ECC in my case, unfortunately. Tried --repair,
didn't helped, same for --init-csum-tree. Now using fs in ro mode
(data is fully available), preparing for total rebuild.

 -- Artem

On Tue, Feb 12, 2019 at 5:17 AM Sébastien Luttringer <seblu@xxxxxxxxx> wrote:
>
> Hello,
>
> The context is a BTRFS filesystem on top of an md device (raid5 on 6 disks).
> System is an Arch Linux and the kernel was a vanilla 4.20.2.
>
> # btrfs fi us /home
> Overall:
>     Device size:                  27.29TiB
>     Device allocated:              5.01TiB
>     Device unallocated:           22.28TiB
>     Device missing:                  0.00B
>     Used:                          5.00TiB
>     Free (estimated):             22.28TiB      (min: 22.28TiB)
>     Data ratio:                       1.00
>     Metadata ratio:                   1.00
>     Global reserve:              512.00MiB      (used: 0.00B)
>
> Data,single: Size:4.95TiB, Used:4.95TiB
>    /dev/md127      4.95TiB
>
> Metadata,single: Size:61.01GiB, Used:57.72GiB
>    /dev/md127     61.01GiB
>
> System,single: Size:36.00MiB, Used:560.00KiB
>    /dev/md127     36.00MiB
>
> Unallocated:
>    /dev/md127     22.28TiB
>
> I'm not able to find the root cause of the btrfs corruption. All disks looks
> healthy (selftest ok, no error logged), no kernel trace of link failure or
> something.
> I run a check on the md layer, and 2 mismatch was discovered:
> Feb 11 04:02:35 kernel: md127: mismatch sector in range 490387096-490387104
> Feb 11 04:31:14 kernel: md127: mismatch sector in range 1024770720-1024770728
> I run a repair (resync) but mismatch are still around after.
>
> The first BTRFS warning was:
> Feb 07 11:27:57 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
>
>
> After that, the userland process crashed. Few days ago, I run it again. It
> crashes again but filesystem become read-only
>
> Feb 10 01:07:02 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 01:07:03 kernel: BTRFS error (device md127): error loading props for ino
> 9930722 (root 5): -5
> Feb 10 01:07:03 kernel: BTRFS error (device md127): error loading props for ino
> 9930722 (root 5): -5
> Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 03:16:24 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 03:16:28 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 03:27:34 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 03:27:40 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 05:59:34 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 05:59:34 kernel: BTRFS error (device md127): error loading props for ino
> 9930722 (root 5): -5
> Feb 10 05:59:34 kernel: BTRFS warning (device md127): md127 checksum verify
> failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0
> Feb 10 05:59:34 kernel: BTRFS info (device md127): failed to delete reference
> to fImage%252057(1).jpg, inode 9930722 parent 58718826
> Feb 10 05:59:34 kernel: BTRFS: error (device md127) in
> __btrfs_unlink_inode:3971: errno=-5 IO failure
> Feb 10 05:59:34 kernel: BTRFS info (device md127): forced readonly
>
> The btrfs check report:
>
> # btrfs check -p /dev/md127
> Opening filesystem to check...
> Checking filesystem on /dev/md127
> UUID: 64403592-5a24-4851-bda2-ce4b3844c168
> [1/7] checking root items                      (0:10:21 elapsed, 10056723 items
> checked)
> [2/7] checking extents                         (0:04:59 elapsed, 155136 items
> checked)
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B043109 items
> checked)
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> Csum didn't match
> ref mismatch on [2622304964608 28672] extent item 1, found 0sed, 3783066 items
> checked)
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> Csum didn't match
> incorrect local backref count on 2622304964608 root 5 owner 9930722 offset 0
> found 0 wanted 1 back 0x55d61387cd40
> backref disk bytenr does not match extent record, bytenr=2622304964608, ref
> bytenr=0
> backpointer mismatch on [2622304964608 28672]
> owner ref check failed [2622304964608 28672]
> ref mismatch on [2622304993280 262144] extent item 1, found 0
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> Csum didn't match
> incorrect local backref count on 2622304993280 root 5 owner 9930724 offset 0
> found 0 wanted 1 back 0x55d61387ce70
> backref disk bytenr does not match extent record, bytenr=2622304993280, ref
> bytenr=0
> backpointer mismatch on [2622304993280 262144]
> owner ref check failed [2622304993280 262144]
> ref mismatch on [2622305255424 4096] extent item 1, found 0
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> Csum didn't match
> incorrect local backref count on 2622305255424 root 5 owner 9930727 offset 0
> found 0 wanted 1 back 0x55d61387cfa0
> backref disk bytenr does not match extent record, bytenr=2622305255424, ref
> bytenr=0
> backpointer mismatch on [2622305255424 4096]
> owner ref check failed [2622305255424 4096]
> ref mismatch on [2622305259520 8192] extent item 1, found 0
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> Csum didn't match
> incorrect local backref count on 2622305259520 root 5 owner 9930731 offset 0
> found 0 wanted 1 back 0x55d61387d0d0
> backref disk bytenr does not match extent record, bytenr=2622305259520, ref
> bytenr=0
> backpointer mismatch on [2622305259520 8192]
> owner ref check failed [2622305259520 8192]
> ref mismatch on [2622305267712 188416] extent item 1, found 0
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> Csum didn't match
> incorrect local backref count on 2622305267712 root 5 owner 9930733 offset 0
> found 0 wanted 1 back 0x55d61387d200
> backref disk bytenr does not match extent record, bytenr=2622305267712, ref
> bytenr=0
> backpointer mismatch on [2622305267712 188416]
> owner ref check failed [2622305267712 188416]
> ref mismatch on [2622305456128 4096] extent item 1, found 0
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431
> Csum didn't match
> incorrect local backref count on 2622305456128 root 5 owner 9930734 offset 0
> found 0 wanted 1 back 0x55d61387d330
> backref disk bytenr does not match extent record, bytenr=2622305456128, ref
> bytenr=0
> backpointer mismatch on [2622305456128 4096]
> owner ref check failed [2622305456128 4096]
> owner ref check failed [4140883394560 16384]
> [2/7] checking extents                         (0:31:38 elapsed, 3783074 items
> checked)
> ERROR: errors found in extent allocation tree or chunk allocation
> [3/7] checking free space cache                (0:03:58 elapsed, 5135 items
> checked)
> [4/7] checking fs roots                        (1:02:53 elapsed, 139654 items
> checked)
>
> I tried to mount the filesystem with nodatasum but I was not able to delete the
> suspected wrong directory. FS was remounted RO.
> btrfs inspect-internal logical-resolve and btrfs inspect-internal inode-resolve
> are not able to resolve logical and inode path from the above errors.
>
> How could I save my filesystem? Should I try --repair or --init-csum-tree?
>
> Regards,
>
> Sébastien "Seblu" Luttringer
>




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux