Re: Monitoring not working as "dev stats" returns 0 after read error occurred

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 9.01.20 г. 12:33 ч., Philip Seeger wrote:
> On 2020-01-08 20:35, Graham Cobb wrote:
>>> BTRFS info (device sda3): read error corrected: ino 194473 off 2170880
>>
>> I am not convinced that that message is telling you that the error
>> happened on device sda3. Certainly in some other cases Btrfs error
>> messages  identify the **filesystem** using the name of the device the
>> kernel thinks is mounted, which might be sda3.
> 
> You're right, it looks like I copied the wrong piece, my bad. This btrfs
> filesystem is a mirror with two drives:
> 
> # btrfs fi show / | grep devid
>     devid    1 size 100.00GiB used 81.03GiB path /dev/sda3
>     devid    2 size 100.00GiB used 81.03GiB path /dev/nvme0n1p3
> 
> And this is from dmesg:
> 
> print_req_error: critical medium error, dev nvme0n1, sector 40910720
> flags 84700
> BTRFS info (device sda3): read error corrected: ino 194473 off 2134016
> (dev /dev/nvme0n1p3 sector 36711808)
> 
> So it's nvme0n1 that's about to die. But it doesn't matter, dev stats
> prints 0 for all error counts as if nothing had ever happened.
> 

According to the log provided the error returned from the NVME device is
BLK_STS_MEDIUM/-ENODATA hence the "critical medium" string there. Btrfs'
code OTOH only logs error in case we it gets STS_IOERR or STS_TARGET
from the block layer. It seems there are other error codes which are
also ignored but can signify errors e.g. STS_NEXUS/STS_TRANSPORT.

So as it stands this is expected but I'm not sure it's correct behavior,
perhaps we need to extend the range of conditions we record as errors.

Thanks for the report.


<snip>



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux