Martin Monperrus posted on Fri, 24 Apr 2015 19:44:47 +0200 as excerpted:
> Hi Duncan,
>
>> The kernel log (dmesg, also logged to syslog/journald on most systems)
>> from during the scrub should capture more information on those errors.
> Thanks. The dmesg log indeed contains the file path (see below).
>
> The error is in /home/martin/XXXXX. It is related to a low-level error
> ("failed command: READ DMA").
>
> Beyond this corrupted file, is my disk dead?
> Can I repair the file system or re-create a new one on the same disk?
A direct answer is beyond my knowledge level, certainly without SMART
status information, etc. What I do know is that assuming the rest of the
device is responding fine, most drives keep a number of reserved sectors
available and will automatically substitute them in on a *write* to an
affected dead sector.
So if the device in general appears to be working fine, and assuming the
SMART status still passes, I'd backup everything else on that partition,
unmount it, then do something like a badblocks destructive write (-w)
test to the partition. If it comes back clean, I'd consider the device
usable again.
Also note that if you run smartctl -A (attributes) on the device before
attempting anything else and check the raw value for ID 5 (reallocated
sector count), then check again after doing something like that badblocks
-w, you can see if it actually relocated any sectors. Finally, note that
while it's possible to have a one-off, once a drive starts reallocating
sectors it often fails relatively quickly as that can indicate a failing
media layer and once it starts to go, often it doesn't stop. So once you
see that value move from zero, do keep an eye on it and if you notice the
value starting to climb, get the data off that thing as soon as possible.
And of course it should go without saying, but I'll repeat the sysadmin's
data value rule of thumb anyway, for the benefit of others reading as
well. If you care about the data, by definition, you have a (tested)
backup (a corollary rule states that an untested backup isn't a backup at
all). If you don't have a backup, by definition you do NOT care about
that data, /regardless/ of any claims to the contrary. Unfortunately,
many (most?) people end up learning this the hard way, finding out too
late how much more value the data had than they thought, and thus that
they /should/ have cared about it more (more backups, more testing of
them) than they did.
(For those who end up in that situation...) On the flip side there's the
big picture. During hurricane Katrina a data hosting firm in New Orleans
made (tech) headlines by blogging live their struggle to stay powered and
online. I was one of thousands watching that, along with the mainstream
news about the flooding, looting and dying going on. Obviously losing a
bit of data ends up pretty far down the list when you're wet and cold and
just lost your house and possibly members of your family! A bit of data
loss might hurt a bit, but in the big picture, if you're still healthy,
and have a job and a home and family, it's /not/ the end of the world. A
bit of perspective helps! =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html