Re: A Big Thank You, and some Notes on Current Recovery Tools.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 1, 2018 at 7:15 AM, Kai Krakow <hurikhan77@xxxxxxxxx> wrote:
> Am Mon, 01 Jan 2018 18:13:10 +0800 schrieb Qu Wenruo:
>
>> On 2018年01月01日 08:48, Stirling Westrup wrote:
>>>
>>> 1) I had a 2T drive die with exactly 3 hard-sector errors and those 3
>>> errors exactly coincided with the 3 super-blocks on the drive.
>>
>> WTF, why all these corruption all happens at btrfs super blocks?!
>>
>> What a coincident.
>
> Maybe it's a hybrid drive with flash? Or something that went wrong in the
> drive-internal cache memory the very time when superblocks where updated?
>
> I bet that the sectors aren't really broken, just the on-disk checksum
> didn't match the sector. I remember such things happening to me more than
> once back in the days when drives where still connected by molex power
> connectors. Those connectors started to get loose over time, due to
> thermals or repeated disconnect and connect. That is, drives sometimes
> started to no longer have a reliable power source which let to all sorts
> of very strange problems, mostly resulting in pseudo-defective sectors.
>
> That said, the OP would like to check the power supply after this
> coincidence... Maybe it's aging and no longer able to support all four
> drives, CPU, GPU and stuff with stable power.

You may be right about the cause of the error being a power-supply issue.
For those that are curious, the drive that failed was a Seagate Barracuda
LP 2000G drive (ST2000DL003).

I hadn't gone into the particulars of the failure, but the BTRFS in
question is my
file server and it mostly holds ripped DVDs, so the storage tends to
grow in size
but existing files seldom change, unless I reorganize things. The
intent is for it to
be backed up to a proper RAIDed BTRFS system weekly, but I have to admit that
I've never gotten around to automating the start of backups and have just been
running it whenever I make large changes to the file server, or
reorganize things.

I was starting to run out of space on the file server, and I had
noticed a few transient
drive errors in the logs (from the 2T device that failed) and so had
decided I'd add
another 2T device to the array temporarily, and then replace both the
failing device and the
temp device with a new 4T drive once I'd had a chance to go buy a new one.

In hind sight (which is always 20/20), I should have updated the
backups before starting to
make my changes, but as I'd just added a new 4T drive to the BTRFS
RAID6 in my backup
system a week before, and it went as smooth as butter, I guess I was
feeling insufficiently
paranoid.

I shut down the system, installed the 5th drive, rebooted... and
nothing. The system made some
horrible sounds and refused to boot. It wouldn't even get past POST.
Not being a hardware
guy I wasn't sure what killed my server box, but I assume it was the
power supply. Again, once
I get the chance I'll take it to my local computer shop and have
someone look at it.

Luckily I had an exactly identical system laying idle, so I swapped
all the drives and the extra sata
controller to handle them, and booted it up, only to find that the
failing drive had now definitely failed.

Interesting, the various tools I used kept reporting an 'unknown
error' for the 3 bad sectors. IIRC, one
of the diagnostic tools reported it as "Error 11 (Unknown)". In any
case, there appeared to be many
errors on the disk, but when I used ddrescue to make a full copy of
it, all of the sectors were (eventually)
fully recovered, except for the 3 superblocks.

After a few days of non-destructive tests and googling for information
on BTRFS multi-drive systems, I
finally decided I had to contact this list for advice, and the rest is
well documented.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html





[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux