On 25.03.19 г. 18:26 ч., David Sterba wrote:
> On Mon, Mar 25, 2019 at 02:31:27PM +0200, Nikolay Borisov wrote:
>> From: Jeff Mahoney <jeffm@xxxxxxxx>
>>
>> The pending chunks list contains chunks that are allocated in the
>> current transaction but haven't been created yet. The pinned chunks
>> list contains chunks that are being released in the current transaction.
>> Both describe chunks that are not reflected on disk as in use but are
>> unavailable just the same.
>>
>> The pending chunks list is anchored by the transaction handle, which
>> means that we need to hold a reference to a transaction when working
>> with the list.
>>
>> We use these lists to ensure that we don't end up discarding chunks
>> that are allocated or released in the current transaction. What we r
>>
>> The way we use them is by iterating over both lists to perform
>> comparisons on the stripes they describe for each device. This is
>> backwards and requires that we keep a transaction handle open while
>> we're trimming.
>>
>> This patchset adds an extent_io_tree to btrfs_device that maintains
>> the allocation state of the device. Extents are set dirty when
>> chunks are first allocated -- when the extent maps are added to the
>> mapping tree. They're cleared when last removed -- when the extent
>> maps are removed from the mapping tree. This matches the lifespan
>> of the pending and pinned chunks list and allows us to do trims
>> on unallocated space safely without pinning the transaction for what
>> may be a lengthy operation. We can also use this io tree to mark
>> which chunks have already been trimmed so we don't repeat the operation.
>>
>> Signed-off-by: Nikolay Borisov <nborisov@xxxxxxxx>
>> ---
>> fs/btrfs/ctree.h | 6 ---
>> fs/btrfs/disk-io.c | 11 -----
>> fs/btrfs/extent-tree.c | 28 -----------
>> fs/btrfs/extent_io.c | 2 +-
>> fs/btrfs/extent_io.h | 6 ++-
>> fs/btrfs/extent_map.c | 36 ++++++++++++++
>> fs/btrfs/extent_map.h | 1 -
>> fs/btrfs/free-space-cache.c | 4 --
>> fs/btrfs/transaction.c | 9 ----
>> fs/btrfs/transaction.h | 1 -
>> fs/btrfs/volumes.c | 96 +++++++++++++------------------------
>> fs/btrfs/volumes.h | 2 +
>> 12 files changed, 76 insertions(+), 126 deletions(-)
>>
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -918,7 +918,7 @@ static void cache_state(struct extent_state *state,
>> * [start, end] is inclusive This takes the tree lock.
>> */
>>
>> -static int __must_check
>> +int __must_check
>> __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
>> unsigned bits, unsigned exclusive_bits,
>> u64 *failed_start, struct extent_state **cached_state,
>
> Does this really need to be exported again? There are helpers that can
> wrap specific combinations of parameters.
This is exported so that __set_extent_bit could be called with
GFP_NOWAIT parameter otherwise I was getting lockdep splats since
extent_map_device_set_bits (called from add_extent_mapping) is called
under a write_lock hence we can't sleep.
>
>> @@ -335,6 +335,8 @@ void btrfs_free_device(struct btrfs_device *device)
>> {
>> WARN_ON(!list_empty(&device->post_commit_list));
>> rcu_string_free(device->name);
>> + if (!in_softirq())
>> + extent_io_tree_release(&device->alloc_state);
This is used to distinguish between btrfs_free_device being called from
btrfs_close_devices in close_ctree i.e non rcu (hence no softirq )
context or any of the error handlers and from free_device_rcu. In the
latter case the extent tree is already freed in btrfs_close_one_device,
hence there is no need to do it in the RCU callback.
Furthermore, there is also a comment that the extent io tree cannot be
destroyed in RCU context because extent_io_tree_release calls
cond_resched_lock which in turn could sleep, but this is forbidden in
RCU context.
>
> This needs a comment
>
>> bio_put(device->flush_bio);
>> kfree(device);
>> }
>
> The commit is quite big but I don't see how to shrink it, the changes
> need to be done in several places. So, probably ok.
>