Re: BTRFS did it's job nicely (thanks!)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 5, 2018 at 6:27 AM, Austin S. Hemmelgarn
<ahferroin7@xxxxxxxxx> wrote:
> On 11/4/2018 11:44 AM, waxhead wrote:
>>
>> Sterling Windmill wrote:
>>>
>>> Out of curiosity, what led to you choosing RAID1 for data but RAID10
>>> for metadata?
>>>
>>> I've flip flipped between these two modes myself after finding out
>>> that BTRFS RAID10 doesn't work how I would've expected.
>>>
>>> Wondering what made you choose your configuration.
>>>
>>> Thanks!
>>> Sure,
>>
>>
>> The "RAID"1 profile for data was chosen to maximize disk space utilization
>> since I got a lot of mixed size devices.
>>
>> The "RAID"10 profile for metadata was chosen simply because it *feels* a
>> bit faster for some of my (previous) workload which was reading a lot of
>> small files (which I guess was embedded in the metadata). While I never
>> remembered that I got any measurable performance increase the system simply
>> felt smoother (which is strange since "RAID"10 should hog more disks at
>> once).
>>
>> I would love to try "RAID"10 for both data and metadata, but I have to
>> delete some files first (or add yet another drive).
>>
>> Would you like to elaborate a bit more yourself about how BTRFS "RAID"10
>> does not work as you expected?
>>
>> As far as I know BTRFS' version of "RAID"10 means it ensure 2 copies (1
>> replica) is striped over as many disks it can (as long as there is free
>> space).
>>
>> So if I am not terribly mistaking a "RAID"10 with 20 devices will stripe
>> over (20/2) x 2 and if you run out of space on 10 of the devices it will
>> continue to stripe over (5/2) x 2. So your stripe width vary with the
>> available space essentially... I may be terribly wrong about this (until
>> someones corrects me that is...)
>
> He's probably referring to the fact that instead of there being a roughly
> 50% chance of it surviving the failure of at least 2 devices like classical
> RAID10 is technically able to do, it's currently functionally 100% certain
> it won't survive more than one device failing.

Right. Classic RAID10 is *two block device* copies, where you have
mirror1 drives and mirror2 drives, and each mirror pair becomes a
single virtual block device that are then striped across. If you lose
a single mirror1 drive, its mirror2 data is available and
statistically unlikely to also go away.

Whereas with Btrfs raid10, it's *two block group* copies. And it is
the block group that's striped. That means block group copy 1 is
striped across 1/2 the available drives (at the time the bg is
allocated), and block group copy 2 is striped across the other drives.
When a drive dies, there is no single remaining drive that contains
all the missing copies, they're distributed. Which means you've got a
very good chance in a 2 drive failure of losing two copies of either
metadata or data or both. While I'm not certain it's 100% not
survivable, the real gotcha is it's possible maybe even likely that
it'll mount and seem to work fine but as soon as it runs into two
missing bg's, it'll face plant.


-- 
Chris Murphy



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux