On Mon, Sep 23, 2019 at 2:24 PM <hoegge@xxxxxxxxx> wrote: > > Hi Chris > > uname: > Linux MHPNAS 3.10.105 #24922 SMP Wed Jul 3 16:37:24 CST 2019 x86_64 GNU/Linux synology_avoton_1815+ > > btrfs --version > btrfs-progs v4.0 > > ash-4.3# btrfs device stats . > [/dev/mapper/vg1-volume_1].write_io_errs 0 > [/dev/mapper/vg1-volume_1].read_io_errs 0 > [/dev/mapper/vg1-volume_1].flush_io_errs 0 > [/dev/mapper/vg1-volume_1].corruption_errs 1014 > [/dev/mapper/vg1-volume_1].generation_errs 0 I'm pretty sure these values are per 4KiB block on x86. If that's correct, this is ~4MiB of corruption. > Concerning self healing? Synology run BTRFS on top of their SHR - which means, this where there is redundancy (like RAID5 / RAID6). I don't think they use any BTRFS RAID (likely due to the RAID5/6 issues with BTRFS). Does that then mean, there is no redundancy / self-healing available for data? That's correct. What do you get for # btrfs fi show # btrfs fi df <mountpoint> mountpoint is for the btrfs volume - any location it's mounted on will do > How would you like the log files - in private mail. I assume it is the kern.log. To make them useful, I suppose I should also pinpoint which files seem to be intact? You could do a firefox send which will encrypt it locally and allow you to put a limit on the number of times it can be downloaded if you want to avoid bots from seeing it. *shrug* > > I gather it is the "BTRFS: (null) at logical ... " line that indicate mismatch errors ? Not sure why the state "(null"). Like: > > 2019-09-22T16:52:09+02:00 MHPNAS kernel: [1208505.999676] BTRFS: (null) at logical 1123177283584 on dev /dev/vg1/volume_1, sector 2246150816, root 259, inode 305979, offset 1316306944, length 4096, links 1 (path: Backup/Virtual Machines/Kan slettes/Smaller Clone of Windows 7 x64 for win 10 upgrade.vmwarevm/Windows 7 x64-cl1.vmdk) If they're all like this one, this is strictly a data corruption issue. You can resolve it by replacing it with a known good copy. Or you can unmount the Btrfs file system and use 'btrfs restore' to scrape out the "bad" copy. Whenever there's a checksum error like this on Btrfs, it will EIO to user space, it will not let you copy out this file if it thinks it's corrupt. Whereas 'btrfs restore' will let you do it. That particular version you have, I'm not sure if it'll complain, but if so, there's a flag to make it ignore errors so you can still get that file out. Then remount, and copy that file right on top of itself. Of course this isn't fixing corruption if it's real, it just makes the checksum warnings go away. I'm gonna guess Synology has a way to do a scrub and check the results but I don't know how it works, whether it does a Btrfs only scrub or also an md scrub. You'd need to ask them or infer it from how this whole stack is assembled and what processes get used. But you can do an md scrub on your own. From 'man 4 md' " md arrays can be scrubbed by writing either check or repair to the file md/sync_action in the sysfs directory for the device." You'd probably want to do a check. If you write repair, then md assumes data chunks are good, and merely rewrites all new parity chunks. The check will compare data chunks to parity chunks and report any mismatch in " A count of mismatches is recorded in the sysfs file md/mismatch_cnt. This is set to zero when a scrub starts and is incremented whenever a sector " That should be 0. If that is not a 0 then there's a chance there's been some form of silent data corruption since that file was originally copied to the NAS. But offhand I can't account for why they trigger checksum mismatches on Btrfs and yet md5 matches the original files elsewhere. Are you sharing the vmdk over the network to a VM? Or is it static and totally unused while on the NAS? Chris Murphy
