On 20.02.20 г. 14:59 ч., Qu Wenruo wrote:
>
>
> On 2020/2/20 下午8:49, Nikolay Borisov wrote:
>> <snip>
>>
>>>
>>> Suggested-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
>>> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
>>> ---
>>> fs/btrfs/volumes.c | 216 ++++++++++++++++++++++++++++++++++++++++-----
>>> fs/btrfs/volumes.h | 11 +++
>>> 2 files changed, 207 insertions(+), 20 deletions(-)
>>>
>>
<snip>
>>> + sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
>>> + btrfs_cmp_device_info, NULL);
>>> + ndevs -= ndevs % raid_attr->devs_increment;
>>
>> nit: ndevs = rounddown(ndevs, raid_attr->devs_increment);
>
> IIRC round_down() can only be used when the alignment is power of 2.
>
> Don't forget we have RAID1C3 now.
Sure, but I'm referring to rounddown and not round_down :)
>
>> makes it more clear what's going on. Since you are working with at most
>> int it's not a problem for 32bits.
>>
>>
>>> + if (ndevs < raid_attr->devs_min)
>>> + return -ENOSPC;
>>> + if (raid_attr->devs_max)
>>> + ndevs = min(ndevs, (int)raid_attr->devs_max);
>>> + else
>>> + ndevs = min(ndevs, (int)BTRFS_MAX_DEVS(fs_info));
>>
>> Instead of casting simply use min_t(int, ndevs, BTRFS_MAX_DEVS...)
>
> That looks the same to me...
I guess it's a matter of preference so I will defer to David to decide.
>
>>
>>> +
>>> + /*
>>> + * Now allocate a virtual chunk using the unallocate space of the
>>
>> nit: missing d after 'unallocate'
>>
>>> + * device with the least unallocated space.
>>> + */
>>> + stripe_size = round_down(devices_info[ndevs - 1].total_avail,
>>> + fs_info->sectorsize);
>>> + if (stripe_size == 0)
>>> + return -ENOSPC;
>>
>> Isn't this check redundant - in the loop where you iterate the devices
>> you always ensure total_avail is at least a sector size, this guarantees
>> that stripe_size cannot be 0 at this point.
>>
>>> +
>>> + for (i = 0; i < ndevs; i++)
>>> + devices_info[i].dev->virtual_allocated += stripe_size;
>>> + *allocated = stripe_size * (ndevs - raid_attr->nparity) /
>>> + raid_attr->ncopies;
>>> + return 0;
>>> +}
>>> +
>>> +static int calc_one_profile_avail(struct btrfs_fs_info *fs_info,
>>> + enum btrfs_raid_types type,
>>> + u64 *result_ret)
>>> +{
>>> + struct btrfs_device_info *devices_info = NULL;
>>> + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
>>> + struct btrfs_device *device;
>>> + u64 allocated;
>>> + u64 result = 0;
>>> + int ret = 0;
>>> +
>>
>> lockdep assert that chunk mutex is held since you access alloc_list.
>>
>>> + ASSERT(type >= 0 && type < BTRFS_NR_RAID_TYPES);
>>> +
>>> + /* Not enough devices, quick exit, just update the result */
>>> + if (fs_devices->rw_devices < btrfs_raid_array[type].devs_min)
>>> + goto out;
>>
>> You can directly return.
>>
>>> +
>>> + devices_info = kcalloc(fs_devices->rw_devices, sizeof(*devices_info),
>>> + GFP_NOFS);
>>> + if (!devices_info) {
>>> + ret = -ENOMEM;
>>> + goto out;
>>
>> Same here.
>>
>>> + }
>>> + /* Clear virtual chunk used space for each device */
>>> + list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list)
>>> + device->virtual_allocated = 0;
>>> + while (ret == 0) {
>>> + ret = alloc_virtual_chunk(fs_info, devices_info, type,
>>> + &allocated);
>> The 'allocated' variable is used only in this loop so declare it in the
>> loop. Ideally we want variables to have the shortest possible lifecycle.
>>
>>> + if (ret == 0)
>>> + result += allocated;
>>> + }
>>
After the explanation on IRC I think it's better if this is written as:
while(!alloc_virtual_chunk(fs_info, devices_info, type, &allocated))
result += allocated
That way it's easier (at least for me) to spot you are "draining"
something. IN this case you try to allocate/simulate multiple
allocations to calculate the real available space.
>
> For this case, we must go several loops:
> Dev1: 10G
> Dev2: 5G
> Dev3: 5G
> Type: RAID1.
>
> The first loop will use 5G from dev1, 5G from dev2.
> Then the 2nd loop will use the remaining 5G from dev1, 5G from dev3.
>
> And that's the core problem per-profile available space system want to
> address.
>
<snip>