Donald Pearson posted on Wed, 23 Dec 2015 09:53:41 -0600 as excerpted: > Additionally real Raid10 will run circles around what BTRFS is doing in > terms of performance. In the 20 drive array you're striping across 10 > drives, in BTRFS right now you're striping across 2 no matter what. So > not only do I lose in terms of resilience I lose in terms of > performance. I assume that N-way-mirroring used with BTRFS Raid10 will > also increase the stripe width so that will level out the performance > but you're always going to be short a drive for equal resilience. No, with btrfs raid10, you're /mirroring/ across two drives no matter what. With 20 devices, you're /striping/ across 10 two-way mirrors. It's the same as a standard raid10, in that regard. Tho it's a bit different in that the mix of devices forming the above can differ among different chunks. IOW, the first chunk might be mirrored a/ b c/d e/f g/h i/j k/l m/n o/p q/r s/t, with the stripe across each mirror- pair, but the chunk might be mirrored a/l g/o f/k b/n c/d e/s j/q h/t i/p m/r (I think I got each letter once...), and striped across those pairs. So you get the same performance as a normal raid10 (well, to the extent that btrfs has been optimized, which in large part it hasn't been, yet), but as should always be the case in a raid10, randomized loss of more than a single device can mean data loss. But, because each chunk pair assignment is more or less randomized, unlike a conventional raid10 which lets you map all of one mirror set to one cabinet and all of the second mirror set to another cabinet, so you can reliably lose an entire cabinet and be fine since it's known to correspond exactly to a single mirror set, you can't do that with btrfs raid10, because there's no way to specify individual chunk mirroring and what might be precisely one mirror set with one chunk, is very likely to be both copies of some mirrors and no copies of other mirrors, with another chunk. What I was suggesting as a solution was a setup that: (a) has btrfs raid1 at the top level (b) has a pair of mdraidNs underneath, in this case a pair of 10-device mdraid10s. (c) has the pair of mdraidNs each presented to btrfs as one of its raid1 mirrors. While this is actually raid01, not raid10, in this case it makes more sense than a mixed raid10, because by doing it that way, you'd: 1) keep btrfs' data integrity and error correction at the top level, as it could pull from the second copy if the first failed checksum. 2) be able to stick each mdraid0 in its own cabinet, so loss of the entire cabinet wouldn't be data loss, only redundancy loss. (Reversing that, btrfs raid0 on top of mdraid1, would lose btrfs' ability to correct checksum errors as at the btrfs level, it'd be non-redundant, and mdraid1 doesn't have checksumming, so it couldn't provide the same data integrity service. Without checksumming and pull from the other copy in case of error, you could scrub the mdraid1 to make its mirrors identical again, but you'd be just as likely to copy the bad one to the good one as the reverse. Thus, btrfs really needs to be the raid1 layer unless you simply don't care about data integrity, and because btrfs is the filesystem layer, it has to be the top layer, so you're left doing a raid01 instead of the raid10 that's ordinarily preferred due to locality of a rebuild, absent other factors like this data integrity factor.) And what btrfs N-way-mirroring will provide, in the longer term once btrfs gets that feature and it stabilizes to usability, is the ability to actually have three cabinets, and sustain the loss of two, or four cabinets, and sustain the loss of three, etc. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
