Re: btrfs dev sta not updating

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 23.06.20 г. 9:17 ч., waxhead wrote:
> 
> 
> Nikolay Borisov wrote:
>>
>>
>> On 23.06.20 г. 5:09 ч., Russell Coker wrote:
>>> [395198.926320] BTRFS warning (device sdc1): csum failed root 5 ino
>>> 276 off
>>> 19267584 csum 0x8941f998 expected csum 0xccd545e0 mirror 1
>>> [395199.147439] BTRFS warning (device sdc1): csum failed root 5 ino
>>> 276 off
>>> 20611072 csum 0x8941f998 expected csum 0xdaf657cb mirror 1
>>> [395199.183680] BTRFS warning (device sdc1): csum failed root 5 ino
>>> 276 off
>>> 24190976 csum 0x8941f998 expected csum 0xcddce0b1 mirror 1
>>> [395199.185172] BTRFS warning (device sdc1): csum failed root 5 ino
>>> 276 off
>>> 19267584 csum 0x8941f998 expected csum 0xccd545e0 mirror 1
>>> [395199.330841] BTRFS warning (device sdc1): csum failed root 5 ino
>>> 277 off 0
>>> csum 0x8941f998 expected csum 0xa54d865c mirror 1
>>>
>>> I have a USB stick that's corrupted, I get the above kernel messages
>>> when I
>>> try to copy files from it.  But according to btrfs dev sta it has had
>>> 0 read
>>> and 0 corruption errors.
>>>
>>> root@xev:/mnt/tmp# btrfs dev sta .
>>> [/dev/sdc1].write_io_errs    0
>>> [/dev/sdc1].read_io_errs     0
>>> [/dev/sdc1].flush_io_errs    0
>>> [/dev/sdc1].corruption_errs  0
>>> [/dev/sdc1].generation_errs  0
>>> root@xev:/mnt/tmp# uname -a
>>> Linux xev 5.6.0-2-amd64 #1 SMP Debian 5.6.14-1 (2020-05-23) x86_64
>>> GNU/Linux
>>>
>>
>> The read/write io err counters are updated when even repair bio have
>> failed. So in your case you had some checksum errors, but btrfs managed
>> to repair them by reading from a different mirror. In this case those
>> aren't really counted as io errs since in the end you did get the
>> correct data.
>>
> I don't think this is what most people expect.
> A simple way to solve this could be to put the non-fatal errors in
> parentheses if this can be done easily.
> 
> For example:
> [/dev/sdc1].write_io_errs    0 (5)
> 
> IMHO this would be more readable and more useful.

Frankly just by looking at this example output, without having read any
accompanying documentation it would be hard to deduce what's the
difference between the numbers. Furthermore, those error numbers are
persisted on disk, so if we want to add new persistent error numbers the
disk format would have to be changed. On the other hand we *could* make
even transient errors be counted as persistent ones e.g. in
read_io_errs. But this leads to a different can of worms - if a user
sees read_io_errs should they be worried because potentially some data
is stale or not (give we won't be distinguishing between unrepairable vs
transient ones).

Weighing pros and cons of adding "transient" errors I'd say the effort
would be better invested if instead we clearly document how errors are
counted, admittedly that's a department we are severely lacking in!




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux