Re: btrfs csum failed on git .pack file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 8, 2009 at 5:53 PM, Tracy Reed<treed@xxxxxxxxxxxxxxx> wrote:
> On Tue, Sep 08, 2009 at 10:22:11PM +0200, Markus Trippelsdorf spake thusly:
>> I've already deleted the file in question unfortunately.
>> On IRC Chris decided that either bad RAM or a harddrive error was the
>> most likely reason for this chechsum mismatch.
>
> Which raises an interesting point: I know reiserfs had its problems
> but it also turned up a lot of machines with bad RAM which contributed
> to giving the fs a bad name. With more and more complicated and memory
> consuming filesystem datastructures being stored in RAM, larger volumes
> of RAM in systems, and RAM not really getting any more reliable will
> we ever see a day where something like btrfs is not recommended for
> use in any machine that doesn't have ECC? Does the filesystem do
> anything to protect itself from bad hardware?

Such as the checksums that started this thread?  That *is* a
protection against bad hardware feature.

A large part of reiserfs' problem was a religious degree of "panic on
inconsistency!" so failures of identical severity that might slip by
unnoticed on other file systems were more likely to be noticed. Sadly
shooting the messenger is still a popular sport and the qualities of
BTRFS which make it more bad hardware resistant may well give it a bad
reputation.  I don't know that there is much that can be done about
that.


On Wed, Sep 9, 2009 at 3:01 AM, Jens Axboe<jens.axboe@xxxxxxxxxx> wrote:
> On Wed, Sep 09 2009, Markus Trippelsdorf wrote:
>> What a strange coincidence that it affected git pack files in both cases.
>> It's almost too improbable...
>
> Probably more than a coincidence I think, the question is what though...

Could this have been the same data in both cases?  Either way— if the
hardware was randomly corrupting high entropy blocks with very-low
probability it's quite possible that you two would have seen it while
anyone else who did chalked it up to some other problem.

I've encountered telecom equipment where a particular packet data
interacted poorly with the clock recovery hardware. "Any file
transfers fine, except for this one. This one stalls and never
finishes, but if I unzip it. it's fine!". Ugh. or it could be some
busted ECC that always 'corrects' a particular class of perfectly
valid blocks to something wrong... or it could be a million other
things. At the end of the day you just need to accept that the
hardware is junk. Black list it, give the vendor the best black eye
that you can, and move on.

I can only expect that this is going to get worse over time. I really
wish that it had become the norm for drive makers to expose an
optional raw interface to the flash. Alas, we're stuck with the
equivalent of running Linux on a hypervisor provided by Microsoft...
except the SSD makers are less experienced.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux