On 19.08.19 г. 18:06 ч., Josef Bacik wrote:
> On Mon, Aug 19, 2019 at 05:49:45PM +0300, Nikolay Borisov wrote:
>>
>>
>> On 16.08.19 г. 17:19 ч., Josef Bacik wrote:
>>> Now that we no longer partially fill tickets we need to rework
>>> wake_all_tickets to call btrfs_try_to_wakeup_tickets() in order to see
>>> if any subsequent tickets are able to be satisfied. If our tickets_id
>>> changes we know something happened and we can keep flushing.
>>>
>>> Also if we find a ticket that is smaller than the first ticket in our
>>> queue then we want to retry the flushing loop again in case
>>> may_commit_transaction() decides we could satisfy the ticket by
>>> committing the transaction.
>>>
>>> Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
>>> ---
>>> fs/btrfs/space-info.c | 34 +++++++++++++++++++++++++++-------
>>> 1 file changed, 27 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
>>> index 8a1c7ada67cb..bd485be783b8 100644
>>> --- a/fs/btrfs/space-info.c
>>> +++ b/fs/btrfs/space-info.c
>>> @@ -676,19 +676,39 @@ static inline int need_do_async_reclaim(struct btrfs_fs_info *fs_info,
>>> !test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state));
>>> }
>>>
>>> -static bool wake_all_tickets(struct list_head *head)
>>> +static bool wake_all_tickets(struct btrfs_fs_info *fs_info,
>>> + struct btrfs_space_info *space_info)
>>> {
>>> struct reserve_ticket *ticket;
>>> + u64 tickets_id = space_info->tickets_id;
>>> + u64 first_ticket_bytes = 0;
>>> +
>>> + while (!list_empty(&space_info->tickets) &&
>>> + tickets_id == space_info->tickets_id) {
>>> + ticket = list_first_entry(&space_info->tickets,
>>> + struct reserve_ticket, list);
>>> +
>>> + /*
>>> + * may_commit_transaction will avoid committing the transaction
>>> + * if it doesn't feel like the space reclaimed by the commit
>>> + * would result in the ticket succeeding. However if we have a
>>> + * smaller ticket in the queue it may be small enough to be
>>> + * satisified by committing the transaction, so if any
>>> + * subsequent ticket is smaller than the first ticket go ahead
>>> + * and send us back for another loop through the enospc flushing
>>> + * code.
>>> + */
>>> + if (first_ticket_bytes == 0)
>>> + first_ticket_bytes = ticket->bytes;
>>> + else if (first_ticket_bytes > ticket->bytes)
>>> + return true;
>>>
>>> - while (!list_empty(head)) {
>>> - ticket = list_first_entry(head, struct reserve_ticket, list);
>>> list_del_init(&ticket->list);
>>> ticket->error = -ENOSPC;
>>> wake_up(&ticket->wait);
>>> - if (ticket->bytes != ticket->orig_bytes)
>>> - return true;
>>> + btrfs_try_to_wakeup_tickets(fs_info, space_info);
>>
>> So the change in this logic is directly related to the implementation of
>> btrfs_try_to_wakeup_tickets. Because when we fail and remove a ticket in
>> this function we give a chance that the next ticket *could* be
>> satisfied. But how well does that work in practice, given you fail
>> normal prio tickets here, whereas btrfs_try_to_wakeup_tickets first
>> checks the prio ticket. So even if you are failing normal ticket but
>> there is one unsatifiable prio ticket that won't really change anything.
>
> In practice we don't get to this state with high priority tickets on the list.
> Anything that would be long-ish term on the priority list is evict, and we wait
> for iput()'s in the normal flushing code. At the point we hit wake_all_tickets
> we generally should only have tickets on the normal list.
Be that as it may, I think this assumption needs to be codified via an
assert or WARN_ON.
>
> I suppose we could possibly get into this situation, but again the high priority
> tickets are going to be evict, truncate block, and relocate, which all have
> significantly lower reservation amounts than things like create or unlink. If
> those things are unable to get reservations then we are truly out of space.
> Thanks,
>
> Josef
>