Re: BTRFS checksum mismatch - false positives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 23, 2019 at 2:24 PM <hoegge@xxxxxxxxx> wrote:
>
> Hi Chris
>
> uname:
> Linux MHPNAS 3.10.105 #24922 SMP Wed Jul 3 16:37:24 CST 2019 x86_64 GNU/Linux synology_avoton_1815+
>
> btrfs --version
> btrfs-progs v4.0
>
> ash-4.3# btrfs device stats .
> [/dev/mapper/vg1-volume_1].write_io_errs   0
> [/dev/mapper/vg1-volume_1].read_io_errs    0
> [/dev/mapper/vg1-volume_1].flush_io_errs   0
> [/dev/mapper/vg1-volume_1].corruption_errs 1014
> [/dev/mapper/vg1-volume_1].generation_errs 0

I'm pretty sure these values are per 4KiB block on x86. If that's
correct, this is ~4MiB of corruption.


> Concerning self healing? Synology run BTRFS on top of their SHR - which means, this where there is redundancy (like RAID5 / RAID6). I don't think they use any BTRFS RAID  (likely due to the RAID5/6 issues with BTRFS). Does that then mean, there is no redundancy / self-healing available for data?

That's correct. What do you get for

# btrfs fi show
# btrfs fi df <mountpoint>

mountpoint is for the btrfs volume - any location it's mounted on will do



> How would you like the log files - in private mail. I assume it is the kern.log. To make them useful, I suppose I should also pinpoint which files seem to be intact?

You could do a firefox send which will encrypt it locally and allow
you to put a limit on the number of times it can be downloaded if you
want to avoid bots from seeing it. *shrug*

>
> I gather it is the "BTRFS: (null) at logical ... " line that indicate mismatch errors ? Not sure why the state "(null"). Like:
>
> 2019-09-22T16:52:09+02:00 MHPNAS kernel: [1208505.999676] BTRFS: (null) at logical 1123177283584 on dev /dev/vg1/volume_1, sector 2246150816, root 259, inode 305979, offset 1316306944, length 4096, links 1 (path: Backup/Virtual Machines/Kan slettes/Smaller Clone of Windows 7 x64 for win 10 upgrade.vmwarevm/Windows 7 x64-cl1.vmdk)

If they're all like this one, this is strictly a data corruption
issue. You can resolve it by replacing it with a known good copy. Or
you can unmount the Btrfs file system and use 'btrfs restore' to
scrape out the "bad" copy. Whenever there's a checksum error like this
on Btrfs, it will EIO to user space, it will not let you copy out this
file if it thinks it's corrupt. Whereas 'btrfs restore' will let you
do it. That particular version you have, I'm not sure if it'll
complain, but if so, there's a flag to make it ignore errors so you
can still get that file out. Then remount, and copy that file right on
top of itself. Of course this isn't fixing corruption if it's real, it
just makes the checksum warnings go away.

I'm gonna guess Synology has a way to do a scrub and check the results
but I don't know how it works, whether it does a Btrfs only scrub or
also an md scrub. You'd need to ask them or infer it from how this
whole stack is assembled and what processes get used. But you can do
an md scrub on your own. From 'man 4 md'

 "      md arrays can be scrubbed by writing either check or repair to
the file md/sync_action in the sysfs directory for the device."

You'd probably want to do a check. If you write repair, then md
assumes data chunks are good, and merely rewrites all new parity
chunks. The check will compare data chunks to parity chunks and report
any mismatch in

"       A count of mismatches is recorded in the sysfs file
md/mismatch_cnt.  This is set to zero when a scrub starts and is
incremented whenever a  sector "

That should be 0.

If that is not a 0 then there's a chance there's been some form of
silent data corruption since that file was originally copied to the
NAS. But offhand I can't account for why they trigger checksum
mismatches on Btrfs and yet md5 matches the original files elsewhere.

Are you sharing the vmdk over the network to a VM? Or is it static and
totally unused while on the NAS?



Chris Murphy




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux