On 23/11/13 11:35, Duncan wrote:
> Daniel Pocock posted on Sat, 23 Nov 2013 09:37:50 +0100 as excerpted:
>
>> What about when btrfs detects a bad block checksum and recovers data
>> from the equivalent block on another disk? The wiki says there will be
>> a syslog event. Does btrfs keep any stats on the number of blocks that
>> it considers unreliable and can this be queried from user space?
>
> The way you phrased that question is strange to me (considers unreliable?
> does that mean ones that it had to fix, or ones that it had to fix more
> than once, or...), so I'm not sure this answers it, but from the btrfs
> manpage...
Let me clarify: when I said unreliable, I was referring to those blocks
where the block device driver reads the block without reporting any
error but where btrfs has decided the checksum is bad and not used the
data from the block.
Such blocks definitely exist. Sometimes the data was corrupted at the
moment of writing and no matter how many times you read the block, you
always get a bad checksum.
>>>>>
>
> btrfs device stats [-z] {<path>|<device>}
>
> Read and print the device IO stats for all devices of the filesystem
> identified by <path> or for a single <device>.
>
> Options
>
> -z Reset stats to zero after reading them.
>
> <<<<
>
> Here's the output for my (dual device btrfs raid1) rootfs, here:
>
> btrfs dev stat /
> [/dev/sdc5].write_io_errs 0
> [/dev/sdc5].read_io_errs 0
> [/dev/sdc5].flush_io_errs 0
> [/dev/sdc5].corruption_errs 0
> [/dev/sdc5].generation_errs 0
> [/dev/sda5].write_io_errs 0
> [/dev/sda5].read_io_errs 0
> [/dev/sda5].flush_io_errs 0
> [/dev/sda5].corruption_errs 0
> [/dev/sda5].generation_errs 0
>
> As you can see, for multi-device filesystems it gives the stats per
> component device. Any errors accumulate until a reset using -z, so you
> can easily see if the numbers are increasing over time and by how much.
>
That looks interesting - are these explained anywhere?
Should a Nagios plugin just look for any non-zero value or just focus on
some of those?
Are they runtime stats (since system boot) or are they maintained in the
filesystem on disk?
My own version of the btrfs utility doesn't have that command though, I
am using a Debian stable system. I tried a newer version and it gives
ERROR: ioctl(BTRFS_IOC_GET_DEV_STATS)
so I probably need to update my kernel too.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html