On 3/15/19 7:01 PM, Zygo Blaxell wrote:
> On Wed, Mar 13, 2019 at 11:11:02PM +0100, Jakub Husák wrote:
>> Sorry, fighting with this technology called "email" :)
>>
>>
>> Hopefully better wrapped outputs:
>>
>> On 13. 03. 19 22:58, Jakub Husák wrote:
>>
>>
>>> Hi,
>>>
>>> I added another disk to my 3-disk raid5 and ran a balance command. After
>>> few hours I looked to output of `fi usage` to see that no data are being
>>> used on the new disk. I got the same result even when balancing my raid5
>>> data or metadata.
>>>
>>> Next I tried to convert my raid5 metadata to raid1 (a good idea anyway)
>>> and the new disk started to fill immediately (even though it received
>>> the whole amount of metadata with replicas being spread among the other
>>> drives, instead of being really "balanced". I know why this happened, I
>>> don't like it but I can live with it, let's not go off topic here :)).
>>>
>>> Now my usage output looks like this:
>>>
>> # btrfs filesystem usage /mnt/data1
>> WARNING: RAID56 detected, not implemented
>> Overall:
>> Device size: 10.91TiB
>> Device allocated: 316.12GiB
>> Device unallocated: 10.61TiB
>> Device missing: 0.00B
>> Used: 58.86GiB
>> Free (estimated): 0.00B (min: 8.00EiB)
>> Data ratio: 0.00
>> Metadata ratio: 2.00
>> Global reserve: 512.00MiB (used: 0.00B)
>>
>> Data,RAID5: Size:4.59TiB, Used:4.06TiB
>> /dev/mapper/crypt-sdb 2.29TiB
>> /dev/mapper/crypt-sdc 2.29TiB
>> /dev/mapper/crypt-sde 2.29TiB
>>
>> Metadata,RAID1: Size:158.00GiB, Used:29.43GiB
>> /dev/mapper/crypt-sdb 53.00GiB
>> /dev/mapper/crypt-sdc 53.00GiB
>> /dev/mapper/crypt-sdd 158.00GiB
>> /dev/mapper/crypt-sde 52.00GiB
>>
>> System,RAID1: Size:64.00MiB, Used:528.00KiB
>> /dev/mapper/crypt-sdc 32.00MiB
>> /dev/mapper/crypt-sdd 64.00MiB
>> /dev/mapper/crypt-sde 32.00MiB
>>
>> Unallocated:
>> /dev/mapper/crypt-sdb 393.04GiB
>> /dev/mapper/crypt-sdc 393.01GiB
>> /dev/mapper/crypt-sdd 2.57TiB
>> /dev/mapper/crypt-sde 394.01GiB
>>
>>>
>>> I'm now running `fi balance -dusage=10` (and rising the usage limit). I
>>> can see that the unallocated space is rising as it's freeing the little
>>> used chunks but still no data are being stored on the new disk.
>
> That is exactly what is happening: you are moving tiny amounts of data
> into existing big empty spaces, so no new chunk allocations (which should
> use the new drive) are happening. You have 470GB of data allocated
> but not used, so you have up to 235 block groups to fill before the new
> drive gets any data.
>
> Also note that you always have to do a full data balance when adding
> devices to raid5 in order to make use of all the space, so you might
> as well get started on that now. It'll take a while. 'btrfs balance
> start -dstripes=1..3 /mnt/data1' will work for this case.
>
>>> I it some bug? Is `fi usage` not showing me something (as it states
>>> "WARNING: RAID56 detected, not implemented")?
>
> The warning just means the fields in the 'fi usage' output header,
> like "Free (estimate)", have bogus values because they're not computed
> correctly.
The output of the btrfs-usage-report which comes with the python-btrfs
library (since v11) might be interesting for you here.
It actually will show you pretty accurate numbers, and it also contains
a section that exactly shows you how much currently unallocatable raw
disk space you have on which disk. While moving around things with
balance, you can see the numbers change.
>>> Or is there just too much
>>> free space on the first set of disks that the balancing is not bothering
>>> moving any data?
>
> Yes. ;)
>
>>> If so, shouldn't it be really balancing (spreading) the data among all
>>> the drives to use all the IOPS capacity, even when the raid5 redundancy
>>> constraint is currently satisfied?
>
> btrfs divides the disks into chunks first, then spreads the data across
> the chunks. The chunk allocation behavior spreads chunks across all the
> disks. When you are adding a disk to raid5, you have to redistribute all
> the old data across all the disks to get balanced IOPS and space usage,
> hence the full balance requirement.
>
> If you don't do a full balance, it will eventually allocate data on
> all disks, but it will run out of space on sdb, sdc, and sde first,
> and then be unable to use the remaining 2TB+ on sdd.
Also, if you have a lot of empty space in the current allocations, btrfs
balance has the tendency to first start packing everything together
before allocating new (4 disk wide) block groups.
This is annoying, because it can result in moving the same data multiple
times during balance (into empty space of another existing block group,
and then when that one has its turn again etc).
So you want to get rid of empty space in existing block groups as soon
as possible. btrfs-balance-least-used can do this, (also an example from
python-btrfs), by doing them in order of most empty one first.
A copy of the script with the following change will filter out block
groups that already span 4 drives:
diff --git a/bin/btrfs-balance-least-used b/bin/btrfs-balance-least-used
index 7005347..0b243a3 100755
--- a/bin/btrfs-balance-least-used
+++ b/bin/btrfs-balance-least-used
@@ -41,6 +41,8 @@ def load_block_groups(fs, max_used_pct):
for chunk in fs.chunks():
if not (chunk.type & btrfs.BLOCK_GROUP_DATA):
continue
+ if len(chunk.stripes) > 3:
+ continue
try:
block_group = fs.block_group(chunk.vaddr, chunk.length)
if block_group.used_pct <= max_used_pct:
https://github.com/knorrie/python-btrfs/tree/master/bin
>> # uname -a
>> Linux storage 4.19.0-0.bpo.2-amd64 #1 SMP Debian 4.19.16-1~bpo9+1
>> (2019-02-07) x86_64 GNU/Linux
>> # btrfs --version
>> btrfs-progs v4.17
>> # btrfs fi show
>> Label: none uuid: xxxxxxxxxxxxxxxxx
>> Total devices 4 FS bytes used 4.09TiB
>> devid 2 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sdc
>> devid 3 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sdb
>> devid 4 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sde
>> devid 5 size 2.73TiB used 158.06GiB path /dev/mapper/crypt-sdd
>>
>> # btrfs fi df .
>> Data, RAID5: total=4.59TiB, used=4.06TiB
>> System, RAID1: total=64.00MiB, used=528.00KiB
>> Metadata, RAID1: total=158.00GiB, used=29.43GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
</commercials break>
Hans