Re: btrfs csum failed on git .pack file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 09 2009, Markus Trippelsdorf wrote:
> On Wed, Sep 09, 2009 at 09:01:41AM +0200, Jens Axboe wrote:
> > On Wed, Sep 09 2009, Markus Trippelsdorf wrote:
> > > On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe wrote:
> > > > On Tue, Sep 08 2009, Markus Trippelsdorf wrote:
> > > > > On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
> > > > > > On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
> > > > > > > Just got this error today in my dmesg:
> > > > > > > btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 43905798
> > > > > > > 
> > > > > > > linux % find . -inum 1483065
> > > > > > > ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
> > > > > > > 
> > > > > > > It's the main pack file from my git linux kernel tree:
> > > > > > > 
> > > > > > 
> > > > > > Hmm, I ran into something very similar. Care to check what the corrupted
> > > > > > block of data looks like (and how big it is)?
> > > > > 
> > > > > I've already deleted the file in question unfortunately.
> > > > > On IRC Chris decided that either bad RAM or a harddrive error was the
> > > > > most likely reason for this chechsum mismatch.
> > > > 
> > > > Darn, that's too bad. The corruption issue I had was also in a git pack
> > > > file. It was fine one day, bad the next. Turned out to be 16kb of 0xff
> > > > in the file, and I blamed it on the (cheap) SSD drive that hosted the
> > > > local git repo. It's still the most likely explanation given the nature
> > > > of the problem, however it would have been really interesting to see
> > > > what corruption you had.
> > > 
> > > If by cheap SSD drive you mean an Indilinx Barefoot based one, we might
> > > be using the same hardware (30GB Vertex in my case). 
> > 
> > Spooky, yes indeed that's the very same drive I'm using. Also see my
> > postings on this very issue here, top two entries:
> > 
> > http://axboe.livejournal.com/
> > 
> > So that pretty much looks like it reaffirms some of my suspicions. Is
> > the drive in a laptop that you suspend and resume?
> 
> No. I use it in my workstation, that I never switch off normally.

OK, so we can rule out any interactions between suspending and resuming
the drive. That's at least something.

> > > What a strange coincidence that it affected git pack files in both cases.
> > > It's almost too improbable...
> > 
> > Probably more than a coincidence I think, the question is what though...
> 
> If it really was an SSD error, then it should happen randomly, messing up
> random files. But (contrary to your experience) I never had any issues with 
> this SSD until this single failed checksum.

Not necessarily, they may be some pattern to how the pack files are
accessed (that propagates through to the drive). The fact is, 0xff is an
extremely weird piece of corruption that just reeks of bad flash blocks.
It's almost impossible that it is a software error. If it was all
zeroes, or a bit flip, the likely causes would be very different.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux