Hi,
On 5/26/20 10:19 PM, Goffredo Baroncelli wrote:
> On 5/25/20 7:14 PM, David Sterba wrote:
>> I'll start with the data structures
>>
>> On Thu, Mar 19, 2020 at 09:39:13PM +0100, Goffredo Baroncelli wrote:
>>> From: Goffredo Baroncelli <kreijack@xxxxxxxxx>
>>> +struct btrfs_chunk_info_stripe {
>>> + __u64 devid;
>>> + __u64 offset;
>>> + __u8 dev_uuid[BTRFS_UUID_SIZE];
>>> +};
>>> +
>>> +struct btrfs_chunk_info {
>>> + /* logical start of this chunk */
>>> + __u64 offset;
>>> + /* size of this chunk in bytes */
>>> + __u64 length;
>>> +
>>> + __u64 stripe_len;
>>> + __u64 type;
>>> +
>>> + /* 2^16 stripes is quite a lot, a second limit is the size of a single
>>> + * item in the btree
>>> + */
>>> + __u16 num_stripes;
>>> +
>>> + /* sub stripes only matter for raid10 */
>>> + __u16 sub_stripes;
>>> +
>>> + struct btrfs_chunk_info_stripe stripes[1];
>>> + /* additional stripes go here */
>>> +};
>>
>> This looks like a copy of btrfs_chunk and stripe, only removing items
>> not needed for the chunk information. Rather than copying the
>> unnecessary fileds like dev_uuid in stripe, this should be designed for
>> data exchange with the usecase in mind.
>
> There are two clients for this api:
> - btrfs fi us
> - btrfs dev us
>
> We can get rid of:
> - "offset" fields (2x)
> - "uuid" fields
>
> However the "offset" fields can be used to understand where a logical map
> is on the physical disks. I am thinking about a graphical tool to show this
> mapping, which doesn't exits yet -).
> The offset field may be used as key to get further information (like the chunk
> usage, see below)
>
> Regarding the UUID field, I agree it can be removed because it is redundant (there
> is already the devid)
>
>
>>
>> The format does not need follow the exact layout that kernel uses, ie.
>> chunk info with one embedded stripe and then followed by variable length
>> array of further stripes. This is convenient in one way but not in
>> another one. Alternatively each chunk can be emitted as a single entry,
>> duplicating part of the common fields and adding the stripe-specific
>> ones. This is for consideration.
>>
>> I've looked at my old code doing the chunk dump based on the search
>> ioctl and found that it also allows to read the chunk usage, with one
>> extra search to the block group item where the usage is stored. As this
>> is can be slow, it should be optional. Ie. the main ioctl structure
>> needs flags where this can be requested.
>
> This info could be very useful. I think to something like a balance of
> chunks which are near filled (or near empty). The question is if we
> should have a different ioctl.
Do you mean that you want to allow to a non root user to run btrfs balance?
Otherwise, no. IMO convenience functions for quickly retrieving a
specific subset of data should be created as reusable library functions
in the calling code, not as a redundant extra IOCTL that has to be
maintained.
Hans