Re: [PATCH v2 00/23] Rework btrfs qgroup reserved space framework

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 9, 2015 at 6:45 AM, Qu Wenruo <quwenruo@xxxxxxxxxxxxxx> wrote:
>
>
> Josef Bacik wrote on 2015/10/08 21:36 -0700:
>>
>> On 10/08/2015 07:11 PM, Qu Wenruo wrote:
>>>
>>> In previous rework of qgroup, we succeeded in fixing qgroup accounting
>>> part, making the rfer/excl numbers accurate.
>>>
>>> But that's just part of qgroup work, another part of qgroup still has
>>> quite a lot problem, that's qgroup reserve space part which will lead to
>>> EQUOT even we are far from the limit.
>>>
>>> [[BUG]]
>>> The easiest way to trigger the bug is,
>>> 1) Enable quota
>>> 2) Limit excl of qgroup 5 to 16M
>>> 3) Write [0,2M) of a file inside subvol 5 10 times without sync
>>>
>>> EQUOT will be triggered at about the 8th write.
>>> But after remount, we can still write until about 15M.
>>>
>>> [[CAUSE]]
>>> The problem is caused by the fact that qgroup will reserve space even
>>> the data space is already reserved.
>>>
>>> In above reproducer, each time we buffered write [0,2M) qgroup will
>>> reserve 2M space, but in fact, at the 1st time, we have already reserved
>>> 2M and from then on, we don't need to reserved any data space as we are
>>> only writing [0,2M).
>>>
>>> Also, the reserved space will only be freed *ONCE* when its backref is
>>> run at commit_transaction() time.
>>>
>>> That's causing the reserved space leaking.
>>>
>>> [[FIX]]
>>> The fix is not a simple one, as currently btrfs_qgroup_reserve() will
>>> allocate whatever caller asked for.
>>>
>>> So for accurate qgroup reserve, we introduce a completely new framework
>>> for data and metadata.
>>> 1) Per-inode data reserve map
>>>     Now, each inode will have a data reserve map, recording which range
>>>     of data is already reserved.
>>>     If we are writing a range which is already reserved, we won't need to
>>>     reserve space again.
>>>
>>>     Also, for the fact that qgroup is only accounted at commit_trans(),
>>>     for data commit into disc and its metadata is also inserted into
>>>     current tree, we should free the data reserved range, but still keep
>>>     the reserved space until commit_trans().
>>>
>>>     So delayed_ref_head will have new members to record how much space is
>>>     reserved and free them at commit_trans() time.
>>
>>
>> This is already handled by setting DELALLOC in the io_tree, we do
>> similar sort of stuff for the normal enospc accounting, why not use
>> that?  Thanks,
>>
>> Josef
>
>
> Thanks for pointing this out.
>
> I was also searching for a existing facility, but didn't find one as I'm not
> familiar with io_tree.
>
> After a quick glance, it seems quite fit the need, but not completely sure.
>
> I'll keep investigating on it and try to use it.
>
> BTW, from what I understand, __btrfs_buffered_write() should cause the range
> to be DEALLOC, but I didn't find any call to set_extent_delalloc(),
> it that done in other place?

__btrfs_buffered_write() -> btrfs_dirty_pages() -> btrfs_set_extent_delalloc()

>
> Thanks,
> Qu
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux