Kai Krakow wrote on 2015/12/22 02:48 +0100:
Am Tue, 22 Dec 2015 09:22:20 +0800
schrieb Qu Wenruo <quwenruo@xxxxxxxxxxxxxx>:
Kai Krakow wrote on 2015/12/22 02:05 +0100:
Am Mon, 21 Dec 2015 10:23:31 +0800
schrieb Qu Wenruo <quwenruo@xxxxxxxxxxxxxx>:
Chris Murphy wrote on 2015/12/20 19:12 -0700:
On Sun, Dec 20, 2015 at 6:43 PM, Qu Wenruo
<quwenruo@xxxxxxxxxxxxxx> wrote:
Chris Murphy wrote on 2015/12/20 15:31 -0700:
I think the cause is related to bus power with buggy USB 3 LPM
firmware (these enclosures are cheap maybe $6). I've found some
threads about this being a problem, but it's not expected to
cause any corruptions. So, the fact Btrfs picks up one some
problems might prove that (somewhat) incorrect.
Seems possible. Maybe some metadata just failed to reach disk.
BTW, did I asked for a btrfs-show-super output?
Nope. I will attach to this email below for both devices.
If that's the case, superblock on device 2 maybe older than
superblock on device 1.
Yes, looks iike devid 1 transid 4924, and devid 2 transid 4923.
And it's devid 2 that had device reset and write errors when it
vanished and reappeared as a different block device.
Now all the problem is explained.
You should be good to mount it rw, as RAID1 will handle all the
problem.
How should RAID1 handle this if both copies have valid checksums
(as I would assume here unless shown otherwise)? This is an even
bigger problem with block based RAID1 which does not have checksums
at all. Luckily, btrfs works different here.
No, these two devices don't have the same generation, which means
they point to *different* bytenr.
Like the following:
Super of Dev1:
gen: X + 1
root bytenr: A (Btrfs logical)
logical A is mapped to A1 on dev1 and A2 on dev2.
Super of Dev2:
gen: X
root bytenr: B
Here we don't need to bother bytenr B though.
Due to the power bug, A2 and super of dev2 is not written to dev2.
So you should see the problem now.
A1 on dev1 contains *valid* tree block, but A2 on dev2 doesn't(empty
data only).
And your assumption on "both have valid copies" is wrong.
Check all the 4 attachment in previous mail.
I did only see those attachments at a second glance. Sry.
Primarily I just wanted to note that RAID1 per-se doesn't mean anything
more than: we have two readable copies but we don't know which one is
correct. As in: let the admin think twice about it before blindly
following a guide.
This is why I pointed out btrfs csums which make this a little better
which in turn has further consequences as you describe (for the
treeblock).
In contrast to block-level RAID btrfs usually has the knowledge which
block is correct and which is not.
I just wondered if btrfs allows for the case where both stripes could
have valid checksums despite of btrfs-RAID - just because a failure
occurred right on the spot.
Is this possible? What happens then? If yes, it would mean not to
blindly trust the RAID without doing the homeworks.
Very interesting question.
Although btrfs is a little beyond your expectation on block based RAID1.
1) Yes, it is possible.
2) Btrfs still detects it as an transid error and won't trust the
metadata.(kernel behavior)
And since it's raid1, it will try next copy to go on.
The trick here is, btrfs metadata doesn't only record bytenr of its
child tree block, but also the tranid(generation) of the tree block.
So even such case happens, the transid won't match, and cause btrfs
detects the error.
Thanks,
Qu
Then you can either use scrub on dev2 to fix all the
generation mismatch.
I better understand why this could fix a problem...
Why not?
Tree block/data copy on dev1 is valid, but tree block/data copy on
dev2 is empty(not written), so btrfs detects the csum error, and
scrub will try to rewrite it.
After rewrite, both copy on dev1 and dev2 with match and fix the
problem.
Exactly. ;-) Didn't say anything against it.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html