Re: RAID-1 refuses to balance large drive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



BTW, I decided to follow the original double replace strategy
suggested -- replace 6TB with 8TB and replace 4TB with 6TB.  That
should be sure to leave the 2 large drives each with 2TB free once
expanded, and thus able to fully use all space.

However, the first one has been going for 9 hours and is "189.7% done"
and still going.   Some sort of bug in calculating the completion
status, obviously.  With luck 200% will be enough?

On Sat, May 26, 2018 at 7:21 PM, Brad Templeton <bradtem@xxxxxxxxx> wrote:
> Certainly.  My apologies for not including them before.   As
> described, the disks are reasonably balanced -- not as full as the
> last time.  As such, it might be enough that balance would (slowly)
> free up enough chunks to get things going.  And if I have to, I will
> partially convert to single again.   Certainly btrfs replace seems
> like the most planned and simple path but it will result in a strange
> distribution of the chunks.
>
> Label: 'butter'  uuid: a91755d4-87d8-4acd-ae08-c11e7f1f5438
>        Total devices 3 FS bytes used 6.11TiB
>        devid    1 size 3.62TiB used 3.47TiB path /dev/sdj2Overall:
>    Device size:                  12.70TiB
>    Device allocated:             12.25TiB
>    Device unallocated:          459.95GiB
>    Device missing:                  0.00B
>    Used:                         12.21TiB
>    Free (estimated):            246.35GiB      (min: 246.35GiB)
>    Data ratio:                       2.00
>    Metadata ratio:                   2.00
>    Global reserve:              512.00MiB      (used: 1.32MiB)
>
> Data,RAID1: Size:6.11TiB, Used:6.09TiB
>   /dev/sda        3.48TiB
>   /dev/sdi2       5.28TiB
>   /dev/sdj2       3.46TiB
>
> Metadata,RAID1: Size:14.00GiB, Used:12.38GiB
>   /dev/sda        8.00GiB
>   /dev/sdi2       7.00GiB
>   /dev/sdj2      13.00GiB
>
> System,RAID1: Size:32.00MiB, Used:888.00KiB
>   /dev/sdi2      32.00MiB
>   /dev/sdj2      32.00MiB
>
> Unallocated:
>   /dev/sda      153.02GiB
>   /dev/sdi2     154.56GiB
>   /dev/sdj2     152.36GiB
>
>       devid    2 size 3.64TiB used 3.49TiB path /dev/sda
>        devid    3 size 5.43TiB used 5.28TiB path /dev/sdi2
>
>
> On Sat, May 26, 2018 at 7:16 PM, Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote:
>>
>>
>> On 2018年05月27日 10:06, Brad Templeton wrote:
>>> Thanks.  These are all things which take substantial fractions of a
>>> day to try, unfortunately.
>>
>> Normally I would suggest just using VM and several small disks (~10G),
>> along with fallocate (the fastest way to use space) to get a basic view
>> of the procedure.
>>
>>> Last time I ended up fixing it in a
>>> fairly kluged way, which was to convert from raid-1 to single long
>>> enough to get enough single blocks that when I converted back to
>>> raid-1 they got distributed to the right drives.
>>
>> Yep, that's the ultimate one-fit-all solution.
>> Also, this reminds me about the fact we could do the
>> RAID1->Single/DUP->Single downgrade in a much much faster way.
>> I think it's worthy considering for later enhancement.
>>
>>>  But this is, aside
>>> from being a kludge, a procedure with some minor risk.  Of course I am
>>> taking a backup first, but still...
>>>
>>> This strikes me as something that should be a fairly common event --
>>> your raid is filling up, and so you expand it by replacing the oldest
>>> and smallest drive with a new much bigger one.   In the old days of
>>> RAID, you could not do that, you had to grow all drives at the same
>>> time, and this is one of the ways that BTRFS is quite superior.
>>> When I had MD raid, I went through a strange process of always having
>>> a raid 5 that consisted of different sized drives.  The raid-5 was
>>> based on the smallest of the 3 drives, and then the larger ones had
>>> extra space which could either be in raid-1, or more imply was in solo
>>> disk mode and used for less critical data (such as backups and old
>>> archives.)   Slowly, and in a messy way, each time I replaced the
>>> smallest drive, I could then grow the raid 5.  Yuck.     BTRFS is so
>>> much better, except for this issue.
>>>
>>> So if somebody has a thought of a procedure that is fairly sure to
>>> work and doesn't involve too many copying passes -- copying 4tb is not
>>> a quick operation -- it is much appreciated and might be a good thing
>>> to add to a wiki page, which I would be happy to do.
>>
>> Anyway, "btrfs fi show" and "btrfs fi usage" would help before any
>> further advice from community.
>>
>> Thanks,
>> Qu
>>
>>>
>>> On Sat, May 26, 2018 at 6:56 PM, Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote:
>>>>
>>>>
>>>> On 2018年05月27日 09:49, Brad Templeton wrote:
>>>>> That is what did not work last time.
>>>>>
>>>>> I say I think there can be a "fix" because I hope the goal of BTRFS
>>>>> raid is to be superior to traditional RAID.   That if one replaces a
>>>>> drive, and asks to balance, it figures out what needs to be done to
>>>>> make that work.  I understand that the current balance algorithm may
>>>>> have trouble with that.   In this situation, the ideal result would be
>>>>> the system would take the 3 drives (4TB and 6TB full, 8TB with 4TB
>>>>> free) and move extents strictly from the 4TB and 6TB to the 8TB -- ie
>>>>> extents which are currently on both the 4TB and 6TB -- by moving only
>>>>> one copy.
>>>>
>>>> Btrfs can only do balance in a chunk unit.
>>>> Thus btrfs can only do:
>>>> 1) Create new chunk
>>>> 2) Copy data
>>>> 3) Remove old chunk.
>>>>
>>>> So it can't do the way you mentioned.
>>>> But your purpose sounds pretty valid and maybe we could enhanace btrfs
>>>> to do such thing.
>>>> (Currently only replace can behave like that)
>>>>
>>>>> It is not strictly a "bug" in that the code is operating
>>>>> as designed, but it is an undesired function.
>>>>>
>>>>> The problem is the approach you describe did not work in the prior upgrade.
>>>>
>>>> Would you please try 4/4/6 + 4 or 4/4/6 + 2 and then balance?
>>>> And before/after balance, "btrfs fi usage" and "btrfs fi show" output
>>>> could also help.
>>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>>>
>>>>> On Sat, May 26, 2018 at 6:41 PM, Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote:
>>>>>>
>>>>>>
>>>>>> On 2018年05月27日 09:27, Brad Templeton wrote:
>>>>>>> A few years ago, I encountered an issue (halfway between a bug and a
>>>>>>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was
>>>>>>> fairly full.   The problem was that after replacing (by add/delete) a
>>>>>>> small drive with a larger one, there were now 2 full drives and one
>>>>>>> new half-full one, and balance was not able to correct this situation
>>>>>>> to produce the desired result, which is 3 drives, each with a roughly
>>>>>>> even amount of free space.  It can't do it because the 2 smaller
>>>>>>> drives are full, and it doesn't realize it could just move one of the
>>>>>>> copies of a block off the smaller drive onto the larger drive to free
>>>>>>> space on the smaller drive, it wants to move them both, and there is
>>>>>>> nowhere to put them both.
>>>>>>
>>>>>> It's not that easy.
>>>>>> For balance, btrfs must first find a large enough space to locate both
>>>>>> copy, then copy data.
>>>>>> Or if powerloss happens, it will cause data corruption.
>>>>>>
>>>>>> So in your case, btrfs can only find enough space for one copy, thus
>>>>>> unable to relocate any chunk.
>>>>>>
>>>>>>>
>>>>>>> I'm about to do it again, taking my nearly full array which is 4TB,
>>>>>>> 4TB, 6TB and replacing one of the 4TB with an 8TB.  I don't want to
>>>>>>> repeat the very time consuming situation, so I wanted to find out if
>>>>>>> things were fixed now.   I am running Xenial (kernel 4.4.0) and could
>>>>>>> consider the upgrade to  bionic (4.15) though that adds a lot more to
>>>>>>> my plate before a long trip and I would prefer to avoid if I can.
>>>>>>
>>>>>> Since there is nothing to fix, the behavior will not change at all.
>>>>>>
>>>>>>>
>>>>>>> So what is the best strategy:
>>>>>>>
>>>>>>> a) Replace 4TB with 8TB, resize up and balance?  (This is the "basic" strategy)
>>>>>>> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks
>>>>>>> from 4TB but possibly not enough)
>>>>>>> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with
>>>>>>> recently vacated 6TB -- much longer procedure but possibly better
>>>>>>>
>>>>>>> Or has this all been fixed and method A will work fine and get to the
>>>>>>> ideal goal -- 3 drives, with available space suitably distributed to
>>>>>>> allow full utilization over time?
>>>>>>
>>>>>> Btrfs chunk allocator is already trying to utilize all drivers for a
>>>>>> long long time.
>>>>>> When allocate chunks, btrfs will choose the device with the most free
>>>>>> space. However the nature of RAID1 needs btrfs to allocate extents from
>>>>>> 2 different devices, which makes your replaced 4/4/6 a little complex.
>>>>>> (If your 4/4/6 array is set up and then filled to current stage, btrfs
>>>>>> should be able to utilize all the space)
>>>>>>
>>>>>>
>>>>>> Personally speaking, if you're confident enough, just add a new device,
>>>>>> and then do balance.
>>>>>> If enough chunks get balanced, there should be enough space freed on
>>>>>> existing disks.
>>>>>> Then remove the newly added device, then btrfs should handle the
>>>>>> remaining space well.
>>>>>>
>>>>>> Thanks,
>>>>>> Qu
>>>>>>
>>>>>>>
>>>>>>> On Sat, May 26, 2018 at 6:24 PM, Brad Templeton <bradtem@xxxxxxxxx> wrote:
>>>>>>>> A few years ago, I encountered an issue (halfway between a bug and a
>>>>>>>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was fairly
>>>>>>>> full.   The problem was that after replacing (by add/delete) a small drive
>>>>>>>> with a larger one, there were now 2 full drives and one new half-full one,
>>>>>>>> and balance was not able to correct this situation to produce the desired
>>>>>>>> result, which is 3 drives, each with a roughly even amount of free space.
>>>>>>>> It can't do it because the 2 smaller drives are full, and it doesn't realize
>>>>>>>> it could just move one of the copies of a block off the smaller drive onto
>>>>>>>> the larger drive to free space on the smaller drive, it wants to move them
>>>>>>>> both, and there is nowhere to put them both.
>>>>>>>>
>>>>>>>> I'm about to do it again, taking my nearly full array which is 4TB, 4TB, 6TB
>>>>>>>> and replacing one of the 4TB with an 8TB.  I don't want to repeat the very
>>>>>>>> time consuming situation, so I wanted to find out if things were fixed now.
>>>>>>>> I am running Xenial (kernel 4.4.0) and could consider the upgrade to  bionic
>>>>>>>> (4.15) though that adds a lot more to my plate before a long trip and I
>>>>>>>> would prefer to avoid if I can.
>>>>>>>>
>>>>>>>> So what is the best strategy:
>>>>>>>>
>>>>>>>> a) Replace 4TB with 8TB, resize up and balance?  (This is the "basic"
>>>>>>>> strategy)
>>>>>>>> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks from
>>>>>>>> 4TB but possibly not enough)
>>>>>>>> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with recently
>>>>>>>> vacated 6TB -- much longer procedure but possibly better
>>>>>>>>
>>>>>>>> Or has this all been fixed and method A will work fine and get to the ideal
>>>>>>>> goal -- 3 drives, with available space suitably distributed to allow full
>>>>>>>> utilization over time?
>>>>>>>>
>>>>>>>> On Fri, Mar 25, 2016 at 7:35 AM, Henk Slager <eye1tm@xxxxxxxxx> wrote:
>>>>>>>>>
>>>>>>>>> On Fri, Mar 25, 2016 at 2:16 PM, Patrik Lundquist
>>>>>>>>> <patrik.lundquist@xxxxxxxxx> wrote:
>>>>>>>>>> On 23 March 2016 at 20:33, Chris Murphy <lists@xxxxxxxxxxxxxxxxx> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton <bradtem@xxxxxxxxx>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> I am surprised to hear it said that having the mixed sizes is an odd
>>>>>>>>>>>> case.
>>>>>>>>>>>
>>>>>>>>>>> Not odd as in wrong, just uncommon compared to other arrangements being
>>>>>>>>>>> tested.
>>>>>>>>>>
>>>>>>>>>> I think mixed drive sizes in raid1 is a killer feature for a home NAS,
>>>>>>>>>> where you replace an old smaller drive with the latest and largest
>>>>>>>>>> when you need more storage.
>>>>>>>>>>
>>>>>>>>>> My raid1 currently consists of 6TB+3TB+3*2TB.
>>>>>>>>>
>>>>>>>>> For the original OP situation, with chunks all filled op with extents
>>>>>>>>> and devices all filled up with chunks, 'integrating' a new 6TB drive
>>>>>>>>> in an 4TB+3TG+2TB raid1 array could probably be done in a bit unusual
>>>>>>>>> way in order to avoid immediate balancing needs:
>>>>>>>>> - 'plug-in' the 6TB
>>>>>>>>> - btrfs-replace  4TB by 6TB
>>>>>>>>> - btrfs fi resize max 6TB_devID
>>>>>>>>> - btrfs-replace  2TB by 4TB
>>>>>>>>> - btrfs fi resize max 4TB_devID
>>>>>>>>> - 'unplug' the 2TB
>>>>>>>>>
>>>>>>>>> So then there would be 2 devices with roughly 2TB space available, so
>>>>>>>>> good for continued btrfs raid1 writes.
>>>>>>>>>
>>>>>>>>> An offline variant with dd instead of btrfs-replace could also be done
>>>>>>>>> (I used to do that sometimes when btrfs-replace was not implemented).
>>>>>>>>> My experience is that btrfs-replace speed is roughly at max speed (so
>>>>>>>>> harddisk magnetic media transferspeed) during the whole replace
>>>>>>>>> process and it does in a more direct way what you actually want. So in
>>>>>>>>> total mostly way faster device replace/upgrade than with the
>>>>>>>>> add+delete method. And raid1 redundancy is active all the time. Of
>>>>>>>>> course it means first make sure the system runs up-to-date/latest
>>>>>>>>> kernel+tools.
>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux