On Sun, Dec 21, 2014 at 01:56:54PM -0800, Robert White wrote: > On 12/21/2014 11:34 AM, constantine wrote: > >Some months ago I had 6 uncorrectable errors. I deleted the files that > >contained them and then after scrubbing I had 0 uncorrectable errors. > >After some weeks I encountered new uncorrectable errors. > > > >Question 1: > >Why do I have uncorrectable errors on a RAID-1 filesystem in the first place? > > These are disk/platter/hardware errors. They happen for one of two > reasons. (most likely) There is a flaw, new or existing, on the > platter itself and data just cannot live in that spot. (least > likely) You suffered an environmental hazard (hard jolt) while a > sector was being written and the drive is just choking on the > digital wreckage. > > > >Question 2: > >How do I properly correct them? (Again by deleting their files? :( ) > > You have to _force_ the system to write the sector. If the disk can > correct the sector (not a hardware flaw) the problem goes away > forever. If it can't the drive will re-map the sector with a spare > sector and it will seem to go away forever. Note that one of the drives already has reallocated sectors, so it's on its way to failing, and you should start saving up your pennies for a new one now, even if it hasn't gone properly boom yet. However, that doesn't explain on its own why you're getting unrecoverable errors -- the FS should be able to deal with that. [snip] > The good news is that since you are using RAID1 and checksums you > shouldn't need to delete any files. Just coerce the write and then > btrfs scrub your filesystem and the checksum/rewrite thing should > recover the degraded copy from the good copy in the mirror. If btrfs detects a checksum error, it will try to fix it by reading the other copy and then writing good data to the broken copy again. You don't have to force a write to the FS in order to make it fix broken data this way. A scrub will do this check-and-repair on all content of the filesystem. If the FS is reporting uncorrectable errors, then it's tried both copies and both fail their checksums. This is basically not fixable without removing the files and replacing them with copies from your backup. It's not obvious why you've got correlated errors on two devices, though, and I'm not sure how to work it out. I'd suggest running the full SMART tests on the disks, and running a scrub on the FS, and checking your logs for SATA errors and similar problems. Hugo. [snip] -- Hugo Mills | I must be musical: I've got *loads* of CDs hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: 65E74AC0 | Fran, Black Books
Attachment:
signature.asc
Description: Digital signature
