Re: [PATCH 2/3] btrfs: qgroup: Try to flush qgroup space when we get -EDQUOT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2020/7/2 下午9:57, Josef Bacik wrote:
> On 7/2/20 9:54 AM, Qu Wenruo wrote:
>>
>>
>> On 2020/7/2 下午9:43, Josef Bacik wrote:
>>> On 7/1/20 8:14 PM, Qu Wenruo wrote:
>>>> [PROBLEM]
>>>> There are known problem related to how btrfs handles qgroup reserved
>>>> space.
>>>> One of the most obvious case is the the test case btrfs/153, which do
>>>> fallocate, then write into the preallocated range.
>>>>
>>>>     btrfs/153 1s ... - output mismatch (see
>>>> xfstests-dev/results//btrfs/153.out.bad)
>>>>         --- tests/btrfs/153.out     2019-10-22 15:18:14.068965341 +0800
>>>>         +++ xfstests-dev/results//btrfs/153.out.bad      2020-07-01
>>>> 20:24:40.730000089 +0800
>>>>         @@ -1,2 +1,5 @@
>>>>          QA output created by 153
>>>>         +pwrite: Disk quota exceeded
>>>>         +/mnt/scratch/testfile2: Disk quota exceeded
>>>>         +/mnt/scratch/testfile2: Disk quota exceeded
>>>>          Silence is golden
>>>>         ...
>>>>         (Run 'diff -u xfstests-dev/tests/btrfs/153.out
>>>> xfstests-dev/results//btrfs/153.out.bad'  to see the entire diff)
>>>>
>>>> [CAUSE]
>>>> Since commit c6887cd11149 ("Btrfs: don't do nocow check unless we have
>>>> to"),
>>>> we always reserve space no matter if it's COW or not.
>>>>
>>>> Such behavior change is mostly for performance, and reverting it is not
>>>> a good idea anyway.
>>>>
>>>> For preallcoated extent, we reserve qgroup data space for it already,
>>>> and since we also reserve data space for qgroup at buffered write time,
>>>> it needs twice the space for us to write into preallocated space.
>>>>
>>>> This leads to the -EDQUOT in buffered write routine.
>>>>
>>>> And we can't follow the same solution, unlike data/meta space check,
>>>> qgroup reserved space is shared between data/meta.
>>>> The EDQUOT can happen at the metadata reservation, so doing NODATACOW
>>>> check after qgroup reservation failure is not a solution.
>>>
>>> Why not?  I get that we don't know for sure how we failed, but in the
>>> case of a write we're way more likely to have failed for data reasons
>>> right?
>>
>> Nope, mostly we failed at metadata reservation, as that would return
>> EDQUOT to user space.
>>
>> We may have some cases which get EDQUOT at data reservation part, but
>> that's what we excepted.
>> (And already what we're doing)
>>
>> The problem is when the metadata reservation failed with EDQUOT.
>>
>>>    So why not just fall back to the NODATACOW check and then do the
>>> metadata reservation. Then if it fails again you know its a real EDQUOT
>>> and your done.
>>>
>>> Or if you want to get super fancy you could even break up the metadata
>>> and data reservations here so that we only fall through to the NODATACOW
>>> check if we fail the data reservation.  Thanks,
>>
>> The problem is, qgroup doesn't split metadata and data (yet).
>> Currently data and meta shares the same limit.
>>
>> So when we hit EDQUOT, you have no guarantee it would happen only in
>> qgroup data reservation.
>>
> 
> Sure, but if you are able to do the nocow thing, then presumably your
> quota reservation is less now?  So on failure you go do the complicated
> nocow check, and if it succeeds you retry your quota reservation with
> just the metadata portion, right?  Thanks,

Then metadata portion can still fail, even we skipped the data reserv.

The metadata portion still needs some space, while the data rsv skip
only happens after we're already near the qgroup limit, which means
there are ready not much space left.

Consider this case, we have 128M limit, we fallocated 120M, then we have
dirtied 7M data, plus several kilo for metadata reserved.

Then at the next 1M, we run out of qgroup limit, at whatever position.
Even we skip current 4K for data, the next metadata reserve may still
not be met, and still got EDQUOT at metadata reserve.

Or some other open() calls to create a new file would just get EDQUOT,
without any way to free any extra space.


Instead of try to skip just several 4K for qgroup data rsv, we should
flush the existing 7M, to free at least 7M data + several kilo meta space.

Thanks,
Qu

> 
> Josef

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux