Have same issue (RAID5 over 4 disks): https://marc.info/?l=linux-btrfs&m=154815802313248&w=2 Having perfectly healthy HDDs it seem to be caused by some bit flips in SDRAM which is non-ECC in my case, unfortunately. Tried --repair, didn't helped, same for --init-csum-tree. Now using fs in ro mode (data is fully available), preparing for total rebuild. -- Artem On Tue, Feb 12, 2019 at 5:17 AM Sébastien Luttringer <seblu@xxxxxxxxx> wrote: > > Hello, > > The context is a BTRFS filesystem on top of an md device (raid5 on 6 disks). > System is an Arch Linux and the kernel was a vanilla 4.20.2. > > # btrfs fi us /home > Overall: > Device size: 27.29TiB > Device allocated: 5.01TiB > Device unallocated: 22.28TiB > Device missing: 0.00B > Used: 5.00TiB > Free (estimated): 22.28TiB (min: 22.28TiB) > Data ratio: 1.00 > Metadata ratio: 1.00 > Global reserve: 512.00MiB (used: 0.00B) > > Data,single: Size:4.95TiB, Used:4.95TiB > /dev/md127 4.95TiB > > Metadata,single: Size:61.01GiB, Used:57.72GiB > /dev/md127 61.01GiB > > System,single: Size:36.00MiB, Used:560.00KiB > /dev/md127 36.00MiB > > Unallocated: > /dev/md127 22.28TiB > > I'm not able to find the root cause of the btrfs corruption. All disks looks > healthy (selftest ok, no error logged), no kernel trace of link failure or > something. > I run a check on the md layer, and 2 mismatch was discovered: > Feb 11 04:02:35 kernel: md127: mismatch sector in range 490387096-490387104 > Feb 11 04:31:14 kernel: md127: mismatch sector in range 1024770720-1024770728 > I run a repair (resync) but mismatch are still around after. > > The first BTRFS warning was: > Feb 07 11:27:57 kernel: BTRFS warning (device md127): md127 checksum verify > failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0 > > > After that, the userland process crashed. Few days ago, I run it again. It > crashes again but filesystem become read-only > > Feb 10 01:07:02 kernel: BTRFS warning (device md127): md127 checksum verify > failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0 > Feb 10 01:07:03 kernel: BTRFS error (device md127): error loading props for ino > 9930722 (root 5): -5 > Feb 10 01:07:03 kernel: BTRFS error (device md127): error loading props for ino > 9930722 (root 5): -5 > Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify > failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0 > Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify > failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0 > Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify > failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0 > Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify > failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0 > Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify > failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0 > Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify > failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0 > Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify > failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0 > Feb 10 01:07:03 kernel: BTRFS warning (device md127): md127 checksum verify > failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0 > Feb 10 03:16:24 kernel: BTRFS warning (device md127): md127 checksum verify > failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0 > Feb 10 03:16:28 kernel: BTRFS warning (device md127): md127 checksum verify > failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0 > Feb 10 03:27:34 kernel: BTRFS warning (device md127): md127 checksum verify > failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0 > Feb 10 03:27:40 kernel: BTRFS warning (device md127): md127 checksum verify > failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0 > Feb 10 05:59:34 kernel: BTRFS warning (device md127): md127 checksum verify > failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0 > Feb 10 05:59:34 kernel: BTRFS error (device md127): error loading props for ino > 9930722 (root 5): -5 > Feb 10 05:59:34 kernel: BTRFS warning (device md127): md127 checksum verify > failed on 4140883394560 wanted 7B4B0431 found B809FBEE level 0 > Feb 10 05:59:34 kernel: BTRFS info (device md127): failed to delete reference > to fImage%252057(1).jpg, inode 9930722 parent 58718826 > Feb 10 05:59:34 kernel: BTRFS: error (device md127) in > __btrfs_unlink_inode:3971: errno=-5 IO failure > Feb 10 05:59:34 kernel: BTRFS info (device md127): forced readonly > > The btrfs check report: > > # btrfs check -p /dev/md127 > Opening filesystem to check... > Checking filesystem on /dev/md127 > UUID: 64403592-5a24-4851-bda2-ce4b3844c168 > [1/7] checking root items (0:10:21 elapsed, 10056723 items > checked) > [2/7] checking extents (0:04:59 elapsed, 155136 items > checked) > checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B043109 items > checked) > checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431 > Csum didn't match > ref mismatch on [2622304964608 28672] extent item 1, found 0sed, 3783066 items > checked) > checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431 > checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431 > Csum didn't match > incorrect local backref count on 2622304964608 root 5 owner 9930722 offset 0 > found 0 wanted 1 back 0x55d61387cd40 > backref disk bytenr does not match extent record, bytenr=2622304964608, ref > bytenr=0 > backpointer mismatch on [2622304964608 28672] > owner ref check failed [2622304964608 28672] > ref mismatch on [2622304993280 262144] extent item 1, found 0 > checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431 > checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431 > Csum didn't match > incorrect local backref count on 2622304993280 root 5 owner 9930724 offset 0 > found 0 wanted 1 back 0x55d61387ce70 > backref disk bytenr does not match extent record, bytenr=2622304993280, ref > bytenr=0 > backpointer mismatch on [2622304993280 262144] > owner ref check failed [2622304993280 262144] > ref mismatch on [2622305255424 4096] extent item 1, found 0 > checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431 > checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431 > Csum didn't match > incorrect local backref count on 2622305255424 root 5 owner 9930727 offset 0 > found 0 wanted 1 back 0x55d61387cfa0 > backref disk bytenr does not match extent record, bytenr=2622305255424, ref > bytenr=0 > backpointer mismatch on [2622305255424 4096] > owner ref check failed [2622305255424 4096] > ref mismatch on [2622305259520 8192] extent item 1, found 0 > checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431 > checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431 > Csum didn't match > incorrect local backref count on 2622305259520 root 5 owner 9930731 offset 0 > found 0 wanted 1 back 0x55d61387d0d0 > backref disk bytenr does not match extent record, bytenr=2622305259520, ref > bytenr=0 > backpointer mismatch on [2622305259520 8192] > owner ref check failed [2622305259520 8192] > ref mismatch on [2622305267712 188416] extent item 1, found 0 > checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431 > checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431 > Csum didn't match > incorrect local backref count on 2622305267712 root 5 owner 9930733 offset 0 > found 0 wanted 1 back 0x55d61387d200 > backref disk bytenr does not match extent record, bytenr=2622305267712, ref > bytenr=0 > backpointer mismatch on [2622305267712 188416] > owner ref check failed [2622305267712 188416] > ref mismatch on [2622305456128 4096] extent item 1, found 0 > checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431 > checksum verify failed on 4140883394560 found B809FBEE wanted 7B4B0431 > Csum didn't match > incorrect local backref count on 2622305456128 root 5 owner 9930734 offset 0 > found 0 wanted 1 back 0x55d61387d330 > backref disk bytenr does not match extent record, bytenr=2622305456128, ref > bytenr=0 > backpointer mismatch on [2622305456128 4096] > owner ref check failed [2622305456128 4096] > owner ref check failed [4140883394560 16384] > [2/7] checking extents (0:31:38 elapsed, 3783074 items > checked) > ERROR: errors found in extent allocation tree or chunk allocation > [3/7] checking free space cache (0:03:58 elapsed, 5135 items > checked) > [4/7] checking fs roots (1:02:53 elapsed, 139654 items > checked) > > I tried to mount the filesystem with nodatasum but I was not able to delete the > suspected wrong directory. FS was remounted RO. > btrfs inspect-internal logical-resolve and btrfs inspect-internal inode-resolve > are not able to resolve logical and inode path from the above errors. > > How could I save my filesystem? Should I try --repair or --init-csum-tree? > > Regards, > > Sébastien "Seblu" Luttringer >
