RE: BTRFS checksum mismatch - false positives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



# btrfs fi show
gives no result - not when adding path either

# btrfs fi df /volume1
Data, single: total=4.38TiB, used=4.30TiB
System, DUP: total=8.00MiB, used=96.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=89.50GiB, used=6.63GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B

Here is the log:
https://send.firefox.com/download/5a19aee66a42c04e/#PTt0UkT53Wrxe9EjCQfrWA (password in separate e-mail)
I have removed a few mac-addresses and things before a certain data (that contained all other kinds of info). Let me know if it is too little.

Concerning restoring files - I should have all originals backed up, so assume I can just delete the bad ones and restore the originals. That would take care also of all the checksums, right? But BTRFS does not do anything to prevent the bad blocks from being used again, right?
I'll ask Synology about their stack.

I can't find sysfs on the system - should it be mounted uner /sys ? This is what I have:
morten@MHPNAS:/$ cd sys
morten@MHPNAS:/sys$ ls
block  bus  class  dev  devices  firmware  fs  kernel  module  power
morten@MHPNAS:/sys$ cd fs
morten@MHPNAS:/sys/fs$ ls
btrfs  cgroup  ecryptfs  ext4  fuse  pstore
morten@MHPNAS:/sys/fs$


With respect to the vmdk, I only store it on the NAS for backup. 

Thanks a lot

Best 
Hoegge

-----Original Message-----
From: Chris Murphy <lists@xxxxxxxxxxxxxxxxx> 
Sent: 2019-09-23 22:59
To: hoegge@xxxxxxxxx
Cc: Chris Murphy <lists@xxxxxxxxxxxxxxxxx>; Btrfs BTRFS <linux-btrfs@xxxxxxxxxxxxxxx>
Subject: Re: BTRFS checksum mismatch - false positives

On Mon, Sep 23, 2019 at 2:24 PM <hoegge@xxxxxxxxx> wrote:
>
> Hi Chris
>
> uname:
> Linux MHPNAS 3.10.105 #24922 SMP Wed Jul 3 16:37:24 CST 2019 x86_64 
> GNU/Linux synology_avoton_1815+
>
> btrfs --version
> btrfs-progs v4.0
>
> ash-4.3# btrfs device stats .
> [/dev/mapper/vg1-volume_1].write_io_errs   0
> [/dev/mapper/vg1-volume_1].read_io_errs    0
> [/dev/mapper/vg1-volume_1].flush_io_errs   0
> [/dev/mapper/vg1-volume_1].corruption_errs 1014 
> [/dev/mapper/vg1-volume_1].generation_errs 0

I'm pretty sure these values are per 4KiB block on x86. If that's correct, this is ~4MiB of corruption.


> Concerning self healing? Synology run BTRFS on top of their SHR - which means, this where there is redundancy (like RAID5 / RAID6). I don't think they use any BTRFS RAID  (likely due to the RAID5/6 issues with BTRFS). Does that then mean, there is no redundancy / self-healing available for data?

That's correct. What do you get for

# btrfs fi show
# btrfs fi df <mountpoint>

mountpoint is for the btrfs volume - any location it's mounted on will do



> How would you like the log files - in private mail. I assume it is the kern.log. To make them useful, I suppose I should also pinpoint which files seem to be intact?

You could do a firefox send which will encrypt it locally and allow you to put a limit on the number of times it can be downloaded if you want to avoid bots from seeing it. *shrug*

>
> I gather it is the "BTRFS: (null) at logical ... " line that indicate mismatch errors ? Not sure why the state "(null"). Like:
>
> 2019-09-22T16:52:09+02:00 MHPNAS kernel: [1208505.999676] BTRFS: 
> (null) at logical 1123177283584 on dev /dev/vg1/volume_1, sector 
> 2246150816, root 259, inode 305979, offset 1316306944, length 4096, 
> links 1 (path: Backup/Virtual Machines/Kan slettes/Smaller Clone of 
> Windows 7 x64 for win 10 upgrade.vmwarevm/Windows 7 x64-cl1.vmdk)

If they're all like this one, this is strictly a data corruption issue. You can resolve it by replacing it with a known good copy. Or you can unmount the Btrfs file system and use 'btrfs restore' to scrape out the "bad" copy. Whenever there's a checksum error like this on Btrfs, it will EIO to user space, it will not let you copy out this file if it thinks it's corrupt. Whereas 'btrfs restore' will let you do it. That particular version you have, I'm not sure if it'll complain, but if so, there's a flag to make it ignore errors so you can still get that file out. Then remount, and copy that file right on top of itself. Of course this isn't fixing corruption if it's real, it just makes the checksum warnings go away.

I'm gonna guess Synology has a way to do a scrub and check the results but I don't know how it works, whether it does a Btrfs only scrub or also an md scrub. You'd need to ask them or infer it from how this whole stack is assembled and what processes get used. But you can do an md scrub on your own. From 'man 4 md'

 "      md arrays can be scrubbed by writing either check or repair to
the file md/sync_action in the sysfs directory for the device."

You'd probably want to do a check. If you write repair, then md assumes data chunks are good, and merely rewrites all new parity chunks. The check will compare data chunks to parity chunks and report any mismatch in

"       A count of mismatches is recorded in the sysfs file
md/mismatch_cnt.  This is set to zero when a scrub starts and is incremented whenever a  sector "

That should be 0.

If that is not a 0 then there's a chance there's been some form of silent data corruption since that file was originally copied to the NAS. But offhand I can't account for why they trigger checksum mismatches on Btrfs and yet md5 matches the original files elsewhere.

Are you sharing the vmdk over the network to a VM? Or is it static and totally unused while on the NAS?



Chris Murphy





[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux