Re: How to recover uncorrectable errors ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mar 20, 2013, at 7:33 AM, Frédéric COIFFIER <frederic.coiffier@xxxxxxx> wrote:

> 
> 195 Hardware_ECC_Recovered  0x001a   057   055   000    Old_age   Always       -       63508940
> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       1
> 200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
> 202 Data_Address_Mark_Errs  0x0032   100   253   000    Old_age   Always       -       0

With such high ECC recovered events, I suspect SDC. The value is in manufacturer's tolerance to not fail the drive outright, but the ECC in a consumer SATA drive isn't fool proof. It will fail to detect some errors, and report bad data back to the file system. It will detect and incorrectly "correct" others. Even if most error is detected and correctly corrected, bottom line is you have a file system that knows better and it's saying something is significantly wrong.

If you're going to continue to use the drive, I would at least use hdparm to issue ATA enhanced security erase unit. Then I'd take a smartctl -x capture for reference. Then do an extended offline smart test with -t long, which this drive has never had in its lifetime. And another smartctl -x to compare to the reference and see if either the test completed or failed, and whether any of the attributes changed appreciably during the offline test. Otherwise get a replacement.

The one off UDMA error isn't a media error, but communication between drive and controller, I wouldn't be overly concerned with that.

> The most annoying thing is that we can't delete these files. So, the only way to solve these problems is to replace the filesystem.

The storage media isn't reliable. Replacing the file system eventually will get you right back where you are now, except in a case of multiple devices with a reliable 2nd device.


Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux