Re: [PATCH 2/3] btrfs: add a comment describing delalloc space reservation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2020/2/4 下午8:27, Nikolay Borisov wrote:
>
>
> On 4.02.20 г. 11:48 ч., Qu Wenruo wrote:
>>
>
>
> <snip>
>
>>
>>> + *
>>> + *   Once this space is reserved, it is added to space_info->bytes_may_use.  The
>>> + *   caller must keep track of this reservation and free it up if it is never
>>> + *   used.  With the buffered IO case this is handled via the EXTENT_DELALLOC
>>> + *   bit's on the inode's io_tree.  For direct IO it's more straightforward, we
>>> + *   take the reservation at the start of the operation, and if we write less
>>> + *   than we reserved we free the excess.
>>
>> This part involves the lifespan and state machine of data.
>> I guess more explanation on the state machine would help a lot.
>>
>> Like:
>> Page clean
>> |
>> +- btrfs_buffered_write()
>> |  Reserve data space for data, metadata space for csum/file
>> |  extents/inodes.
>> |
>> Page dirty
>> |
>> +- run_delalloc_range()
>> |  Allocate data extents, submit ordered extents to do csum calculation
>> |  and bio submission
>> Page write back
>> |
>> +- finish_oredred_io()
>> |  Insert csum and file extents
>> |
>> Page clean
>>
>> Although I'm not sure if such lifespan should belong to delalloc-space.c.
>
> This omits a lot of critical details. FOr example it should be noted
> that in btrfs_buffered_write we reserve as much space as is requested by
> the user. Then in run_delalloc_range it must be mentioned that in case
> of compressed extents it can be called to allocate an extent which is
> less than the space reserved in btrfs_buffered_write => that's where the
> possible space savings in case of compressed come from.

If you spoiler everything in the introduction, I guess it's no longer
introduction.
An introduction should only tell the overall picture, not every details.
For details, we go read mentioned functions.

And too many details would make the introduction pretty hard to
maintain. What if one day we don't the current always reserve behavior
for buffered write?

So I tend to have just a overview, and entrance function. With minimal
details unless it's a really complex design.

Thanks,
Qu

>
> As a matter of fact running ordered io doesn't really clean any space
> apart from a bit of metadata space (unless we do overwrites, as per our
> discussion with josef in the slack channel).
>
> <snip>
>
>>> + *
>>> + *   1) Updating the inode item.  We hold a reservation for this inode as long
>>> + *   as there are dirty bytes outstanding for this inode.  This is because we
>>> + *   may update the inode multiple times throughout an operation, and there is
>>> + *   no telling when we may have to do a full cow back to that inode item.  Thus
>>> + *   we must always hold a reservation.
>>> + *
>>> + *   2) Adding an extent item.  This is trickier, so a few sub points
>>> + *
>>> + *     a) We keep track of how many extents an inode may need to create in
>>> + *     inode->outstanding_extents.  This is how many items we will have reserved
>>> + *     for the extents for this inode.
>>> + *
>>> + *     b) count_max_extents() is used to figure out how many extent items we
>>> + *     will need based on the contiguous area we have dirtied.  Thus if we are
>>> + *     writing 4k extents but they coalesce into a very large extent, we will
>
> I THe way you have worded this is a bit confusing because first you
> mention that count_max_extents calcs how many extent items we'll need
> for a contiguous area. Then you mention that if we make a bunch of 4k
> writes that coalesce to a single extent i.e create 1 large contiguous
> (that's what coalescing implies in this context) we'll have to split it
> them. This is counter-intuitive.
>
> I guess what you meant here is physically contiguous as opposed to
> logically contiguous?
>
>>> + *     break this into smaller extents which means we'll need a reservation for
>>> + *     each of those extents.
>>> + *
>>> + *     c) When we set EXTENT_DELALLOC on the inode io_tree we will figure out
>>> + *     the nummber of extents needed for the contiguous area we just created,
>
> nit: s/nummber/number
>
>>> + *     and add that to inode->outstanding_extents.
>
> <snip>
>
>>> + *
>>> + *   3) Adding csums for the range.  This is more straightforward than the
>>> + *   extent items, as we just want to hold the number of bytes we'll need for
>>> + *   checksums until the ordered extent is removed.  If there is an error it is
>>> + *   cleared via the EXTENT_CLEAR_META_RESV bit when clearning EXTENT_DELALLOC
>
> nit: s/clearning/clearing
>
>>> + *   on the inode io_tree.
>>> + */
>>> +
>>>  int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes)
>>>  {
>>>  	struct btrfs_root *root = inode->root;
>>>
>>




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux