Re: Adventures in btrfs raid5 disk recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Just wiping the slate clean to summarize:


1. We have a consistent ~1 in 3 maybe 1 in 2, reproducible corruption
of *data extent* parity during a scrub with raid5. Goffredo and I have
both reproduced it. It's a big bug. It might still be useful if
someone else can reproduce it too.

Goffredo, can you file a bug at bugzilla.kernel.org and reference your
bug thread?  I don't know if the key developers know about this, it
might be worth pinging them on IRC once the bug is filed.

Unknown if it affects balance, or raid 6. And if it affects raid 6, is
p or q corrupted, or both? Unknown how this manifests on metadata
raid5 profile (only tested was data raid5). Presumably if there is
metadata corruption that's fixed during a scrub, and its parity is
overwritten with corrupt parity, the next time there's a degraded
state, the file system would face plant somehow. And we've seen quite
a few degraded raid5's (and even 6's) face plant in inexplicable ways
and we just kinda go, shit. Which is what the fs is doing when it
encounters a pile of csum errors. It treats the csum errors as a
signal to disregard the fs rather than maybe only being suspicious of
the fs. Could it turn out that these file systems were recoverable,
just that Btrfs wasn't tolerating any csum error and wouldn't proceed
further?

2. The existing scrub code computes parity on-the-fly, compares it
with what's on-disk, and overwrites if there's a mismatch. If there's
a mismatch, there's no message anywhere. It's a feature request to get
a message on parity mismatches. An additional feature request would be
to get a parity_error counter along the lines of the other error
counters we have for scrub stats and dev stats.

3. I think it's a more significant change to get parity checksums
stored some where. Right now the csum tree holds item type EXTENT_CSUM
but parity is not an extent, it's also not data, it's a variant of
data. So it seems to me we'd need a new item type PARITY_CSUM to get
it into the existing csum tree. And I'm not sure what incompatibility
that brings; presumably older kernels could mount such a volume ro
safely, but shouldn't write to it, including btrfs check --repair
should probably fail.


Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux