Austin S. Hemmelgarn posted on Tue, 15 Dec 2015 11:00:40 -0500 as excerpted: > AFAIUI, checksums are stored per-instance for every block. This is > important in a multi-device filesystem in case you lose a device, so > that you still have a checksum for the block. There should be no > difference between extent layout and compression between devices > however. I don't believe that's quite correct. What is correct, to the best of my knowledge, is that checksums are metadata, and thus have whatever duplication/parity level metadata is assigned. For single devices, that is of course by default dup, 2X the metadata and thus 2X the checksums, both on the single data (as effectively the only choice on a single device, at least thru 4.3, tho there's a patch adding dup data as an option that I think should be in 4.4) when covering data, dup metadata when covering it. For multiple devices, it's default raid1 metadata, default single data, so the picture doesn't differ much by default from the single-device default picture. It's also possible to do single metadata, raidN data, which really doesn't make sense except for raid0 data, and thus I believe there's a warning about that sort of layout in newer mkfs.btrfs, or when lowering the metadata redundancy using balance filters. But of course it's possible to do raid1 data and metadata, which would be two copies of each, regardless of the number of devices (except that it's 2+, of course). But the copies aren't 1:1 assigned. That is, if they're equal generation, btrfs can read either checksum and apply it to either data/metadata block. (Of course if they're not equal generation, btrfs will choose the higher one, thus covering the case of writing at the time of a crash, since either they will both be the same generation if the root block wasn't updated to the new one on either one yet, or one will be a higher/newer generation than the other, if it had already finished writing one but not the other at the time of the crash.) This is why it's an extremely good idea if you have a pair of devices in raid1, and you mount one of them degraded/writable with the other unavailable for some reason, that you don't also mount the other one writable and then try to recombined them. Chances are the generations wouldn't match and it'd pick the one with the higher generation, but if they did for some reason match, and both checksums were valid on their data, but the data differed... either one could be chosen, and a scrub might choose either one to fix the other, as well, which could in theory result in a file with intermixed blocks from the two different versions! Just ensure that if one is mounted writable, it's the only one mounted writable if there's a chance of recombining, and you'll be fine, as it'll be the only one with advancing generations. And if by some accident both are mounted writable separately, the best bet is to be sure and wipe the one, then add it as a new device, if you're going to reintroduce it to the same filesystem. Of course this gets a bit more complicated with 3+ device raid1, since currently, there's still only two copies of each block and two copies of the checksum, meaning there's at least one device without a copy of each block, and if the filesystem is mounted degraded writable repeatedly with a random device missing... Similarly, the permutations can be calculated for the other raid types, and for mixed raid types like raid6 data (specified) and raid1 metadata (unspecified so the default used), but I won't attempt that here. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
