Re: [PATCH 2/3] btrfs: qgroup: Try to flush qgroup space when we get -EDQUOT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2020/7/2 下午10:58, Josef Bacik wrote:
> On 7/2/20 10:19 AM, Qu Wenruo wrote:
>>
>>
>> On 2020/7/2 下午9:57, Josef Bacik wrote:
>>> On 7/2/20 9:54 AM, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2020/7/2 下午9:43, Josef Bacik wrote:
>>>>> On 7/1/20 8:14 PM, Qu Wenruo wrote:
>>>>>> [PROBLEM]
>>>>>> There are known problem related to how btrfs handles qgroup reserved
>>>>>> space.
>>>>>> One of the most obvious case is the the test case btrfs/153, which do
>>>>>> fallocate, then write into the preallocated range.
>>>>>>
>>>>>>      btrfs/153 1s ... - output mismatch (see
>>>>>> xfstests-dev/results//btrfs/153.out.bad)
>>>>>>          --- tests/btrfs/153.out     2019-10-22 15:18:14.068965341
>>>>>> +0800
>>>>>>          +++ xfstests-dev/results//btrfs/153.out.bad      2020-07-01
>>>>>> 20:24:40.730000089 +0800
>>>>>>          @@ -1,2 +1,5 @@
>>>>>>           QA output created by 153
>>>>>>          +pwrite: Disk quota exceeded
>>>>>>          +/mnt/scratch/testfile2: Disk quota exceeded
>>>>>>          +/mnt/scratch/testfile2: Disk quota exceeded
>>>>>>           Silence is golden
>>>>>>          ...
>>>>>>          (Run 'diff -u xfstests-dev/tests/btrfs/153.out
>>>>>> xfstests-dev/results//btrfs/153.out.bad'  to see the entire diff)
>>>>>>
>>>>>> [CAUSE]
>>>>>> Since commit c6887cd11149 ("Btrfs: don't do nocow check unless we
>>>>>> have
>>>>>> to"),
>>>>>> we always reserve space no matter if it's COW or not.
>>>>>>
>>>>>> Such behavior change is mostly for performance, and reverting it
>>>>>> is not
>>>>>> a good idea anyway.
>>>>>>
>>>>>> For preallcoated extent, we reserve qgroup data space for it already,
>>>>>> and since we also reserve data space for qgroup at buffered write
>>>>>> time,
>>>>>> it needs twice the space for us to write into preallocated space.
>>>>>>
>>>>>> This leads to the -EDQUOT in buffered write routine.
>>>>>>
>>>>>> And we can't follow the same solution, unlike data/meta space check,
>>>>>> qgroup reserved space is shared between data/meta.
>>>>>> The EDQUOT can happen at the metadata reservation, so doing NODATACOW
>>>>>> check after qgroup reservation failure is not a solution.
>>>>>
>>>>> Why not?  I get that we don't know for sure how we failed, but in the
>>>>> case of a write we're way more likely to have failed for data reasons
>>>>> right?
>>>>
>>>> Nope, mostly we failed at metadata reservation, as that would return
>>>> EDQUOT to user space.
>>>>
>>>> We may have some cases which get EDQUOT at data reservation part, but
>>>> that's what we excepted.
>>>> (And already what we're doing)
>>>>
>>>> The problem is when the metadata reservation failed with EDQUOT.
>>>>
>>>>>     So why not just fall back to the NODATACOW check and then do the
>>>>> metadata reservation. Then if it fails again you know its a real
>>>>> EDQUOT
>>>>> and your done.
>>>>>
>>>>> Or if you want to get super fancy you could even break up the metadata
>>>>> and data reservations here so that we only fall through to the
>>>>> NODATACOW
>>>>> check if we fail the data reservation.  Thanks,
>>>>
>>>> The problem is, qgroup doesn't split metadata and data (yet).
>>>> Currently data and meta shares the same limit.
>>>>
>>>> So when we hit EDQUOT, you have no guarantee it would happen only in
>>>> qgroup data reservation.
>>>>
>>>
>>> Sure, but if you are able to do the nocow thing, then presumably your
>>> quota reservation is less now?  So on failure you go do the complicated
>>> nocow check, and if it succeeds you retry your quota reservation with
>>> just the metadata portion, right?  Thanks,
>>
>> Then metadata portion can still fail, even we skipped the data reserv.
>>
>> The metadata portion still needs some space, while the data rsv skip
>> only happens after we're already near the qgroup limit, which means
>> there are ready not much space left.
>>
>> Consider this case, we have 128M limit, we fallocated 120M, then we have
>> dirtied 7M data, plus several kilo for metadata reserved.
>>
>> Then at the next 1M, we run out of qgroup limit, at whatever position.
>> Even we skip current 4K for data, the next metadata reserve may still
>> not be met, and still got EDQUOT at metadata reserve.
>>
>> Or some other open() calls to create a new file would just get EDQUOT,
>> without any way to free any extra space.
>>
>>
>> Instead of try to skip just several 4K for qgroup data rsv, we should
>> flush the existing 7M, to free at least 7M data + several kilo meta
>> space.
>>
> 
> Right so I'm not against flushing in general, I just think that we can
> greatly improve on this particular problem without flushing.  Changing
> how we do the NOCOW check with quota could be faster than doing the
> flushing.

Yep, but as mentioned, the uncertain timing of when we get the EDQUOT is
really annoying and tricky to solve, thus have to go the flushing method.

The performance is definitely slower, but it's not acceptable, since
we're near the limiting, slowing down is pretty common.

> 
> Now as for the flushing part itself, I'd rather hook into the existing
> flushing infrastructure we have.

That's the ultimate objective.

>  Obviously the ticketing is going to be
> different, but the flushing part is still the same, and with data
> reservations now moved over to that infrastructure we finally have it
> all in the same place.  Thanks,

Before the needed infrastructure get merged, I'll keep the current small
retry code and look into what's needed to integrate qgroup rsv into the
ticketing system.

Thanks,
Qu

> 
> Josef

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux