On 2020-01-08 20:35, Graham Cobb wrote:
BTRFS info (device sda3): read error corrected: ino 194473 off 2170880
I am not convinced that that message is telling you that the error
happened on device sda3. Certainly in some other cases Btrfs error
messages identify the **filesystem** using the name of the device the
kernel thinks is mounted, which might be sda3.
You're right, it looks like I copied the wrong piece, my bad. This btrfs
filesystem is a mirror with two drives:
# btrfs fi show / | grep devid
devid 1 size 100.00GiB used 81.03GiB path /dev/sda3
devid 2 size 100.00GiB used 81.03GiB path /dev/nvme0n1p3
And this is from dmesg:
print_req_error: critical medium error, dev nvme0n1, sector 40910720
flags 84700
BTRFS info (device sda3): read error corrected: ino 194473 off 2134016
(dev /dev/nvme0n1p3 sector 36711808)
So it's nvme0n1 that's about to die. But it doesn't matter, dev stats
prints 0 for all error counts as if nothing had ever happened.
# btrfs dev stats /
[/dev/sda3].write_io_errs 0
[/dev/sda3].read_io_errs 0
[/dev/sda3].flush_io_errs 0
[/dev/sda3].corruption_errs 0
[/dev/sda3].generation_errs 0
[/dev/nvme0n1p3].write_io_errs 0
[/dev/nvme0n1p3].read_io_errs 0
[/dev/nvme0n1p3].flush_io_errs 0
[/dev/nvme0n1p3].corruption_errs 0
[/dev/nvme0n1p3].generation_errs 0
# btrfs dev stats / | grep sda3 | grep read
[/dev/sda3].read_io_errs 0
Have you checked the stats for the other devices as well?
Yes, of course. Nevermind that grep. The monitoring cron job checks all
error counts returned by the stats command and sends out an alert if an
error is reported (just like with zfs status on zfs filesystems which
also returns error counts for read/write/cksum errors). But as you can
see, it didn't send out anything as dev stats says that all error counts
are at zero.