On 4.02.20 г. 11:48 ч., Qu Wenruo wrote:
>
<snip>
>
>> + *
>> + * Once this space is reserved, it is added to space_info->bytes_may_use. The
>> + * caller must keep track of this reservation and free it up if it is never
>> + * used. With the buffered IO case this is handled via the EXTENT_DELALLOC
>> + * bit's on the inode's io_tree. For direct IO it's more straightforward, we
>> + * take the reservation at the start of the operation, and if we write less
>> + * than we reserved we free the excess.
>
> This part involves the lifespan and state machine of data.
> I guess more explanation on the state machine would help a lot.
>
> Like:
> Page clean
> |
> +- btrfs_buffered_write()
> | Reserve data space for data, metadata space for csum/file
> | extents/inodes.
> |
> Page dirty
> |
> +- run_delalloc_range()
> | Allocate data extents, submit ordered extents to do csum calculation
> | and bio submission
> Page write back
> |
> +- finish_oredred_io()
> | Insert csum and file extents
> |
> Page clean
>
> Although I'm not sure if such lifespan should belong to delalloc-space.c.
This omits a lot of critical details. FOr example it should be noted
that in btrfs_buffered_write we reserve as much space as is requested by
the user. Then in run_delalloc_range it must be mentioned that in case
of compressed extents it can be called to allocate an extent which is
less than the space reserved in btrfs_buffered_write => that's where the
possible space savings in case of compressed come from.
As a matter of fact running ordered io doesn't really clean any space
apart from a bit of metadata space (unless we do overwrites, as per our
discussion with josef in the slack channel).
<snip>
>> + *
>> + * 1) Updating the inode item. We hold a reservation for this inode as long
>> + * as there are dirty bytes outstanding for this inode. This is because we
>> + * may update the inode multiple times throughout an operation, and there is
>> + * no telling when we may have to do a full cow back to that inode item. Thus
>> + * we must always hold a reservation.
>> + *
>> + * 2) Adding an extent item. This is trickier, so a few sub points
>> + *
>> + * a) We keep track of how many extents an inode may need to create in
>> + * inode->outstanding_extents. This is how many items we will have reserved
>> + * for the extents for this inode.
>> + *
>> + * b) count_max_extents() is used to figure out how many extent items we
>> + * will need based on the contiguous area we have dirtied. Thus if we are
>> + * writing 4k extents but they coalesce into a very large extent, we will
I THe way you have worded this is a bit confusing because first you
mention that count_max_extents calcs how many extent items we'll need
for a contiguous area. Then you mention that if we make a bunch of 4k
writes that coalesce to a single extent i.e create 1 large contiguous
(that's what coalescing implies in this context) we'll have to split it
them. This is counter-intuitive.
I guess what you meant here is physically contiguous as opposed to
logically contiguous?
>> + * break this into smaller extents which means we'll need a reservation for
>> + * each of those extents.
>> + *
>> + * c) When we set EXTENT_DELALLOC on the inode io_tree we will figure out
>> + * the nummber of extents needed for the contiguous area we just created,
nit: s/nummber/number
>> + * and add that to inode->outstanding_extents.
<snip>
>> + *
>> + * 3) Adding csums for the range. This is more straightforward than the
>> + * extent items, as we just want to hold the number of bytes we'll need for
>> + * checksums until the ordered extent is removed. If there is an error it is
>> + * cleared via the EXTENT_CLEAR_META_RESV bit when clearning EXTENT_DELALLOC
nit: s/clearning/clearing
>> + * on the inode io_tree.
>> + */
>> +
>> int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes)
>> {
>> struct btrfs_root *root = inode->root;
>>
>