Re: Monitoring not working as "dev stats" returns 0 after read error occurred

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 08, 2020 at 06:41:36PM +0100, philip@xxxxxxxxxxxxxxxx wrote:
> A bad drive caused i/o errors, but no notifications were sent out because
> "btrfs dev stats" fails to increase the error counter.
> 
> When checking dmesg, looking for something completely different, there were
> some error messages indicating that this drive is causing i/o errors and may
> die soon:
> 
> BTRFS info (device sda3): read error corrected: ino 194473 off 2170880
> 
> But checking the stats (as generally recommended to get notified about such
> errors) claims that no errors have occurred (nothing to see here, keep
> moving):
> 
> # btrfs dev stats / | grep sda3 | grep read
> [/dev/sda3].read_io_errs 0
> 
> Why?
> Isn't that the most important feature of a RAID system - to get notified
> about errors, to be able to replace such a drive...?
> 
> The fs is mirrored, so those errors didn't cause any data loss.

This seems to be a result of commit
0cc068e6ee59c1fffbfa977d8bf868b7551d80ac "btrfs: don't report readahead
errors and don't update statistics" which assumes that readahead errors
are ignored and have no side effects.  It turns out this is not true: in
the event of a recoverable read or csum failure, the filesystem overwrites
the bad block with good data recovered by using mirror copies or parity,
in both normal reads and readahead reads.  Later, when a non-readahead
read comes along to read the block, there's no error on the device any
more because btrfs already corrected it, so no counters are updated.

This is very bad--the whole point of dev stats is to make failure events
observable so failing hardware can be identified and replaced, and commit
0cc068e6ee makes some of those events not count.

This commit appears in kernels 5.1-rc3 and later.  A simple workaround
is to revert the commit, and let printk ratelimiting deal with the dmesg
spam issue.  I don't know if there are other errors that readahead can
generate that are not IO errors (e.g. running out of memory).  If there
are, then there will be some false positive read error counts.  These
are still better than false negative read error counts.

An alternative fix would be to have readahead not correct the data,
so readahead truly has no side-effects.  This would make the rationale
of this commit more supportable--except we'd still want to know if a
device was intermittently failing to read, so we should still count the
readahead errors anyway.

It would be nice to have a dev stats counter for corrections too.  It's a
pretty significant event for monitoring if we overwrite committed data
for any reason, but especially so if we are doing it in response to a
detected device failure.  We should definitely count those.

It would also be nice to throw some data into a tree every time an error
occurs (extent bytenr, offset, generation, error type, event count,
device or mirror ID) every time an error occurs.  A tool can read this
tree to pull out the extent bytenrs, map them back to paths (if the
generation matches; otherwise, it's an old error and the file has since
been overwritten), and produce something like the zfs status output.
But that's way beyond a bug fix.

> 
> # uname -sr
> Linux 5.2.7-100.fc29.x86_64
> # btrfs --version
> btrfs-progs v5.1
> 
> 

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux