On 7/2/20 9:50 AM, Qu Wenruo wrote:
On 2020/7/2 下午9:40, Josef Bacik wrote:
On 7/1/20 8:14 PM, Qu Wenruo wrote:
[PROBLEM]
Before this patch, when btrfs_qgroup_reserve_data() fails, we free all
reserved space of the changeset.
This means the following call is not possible:
ret = btrfs_qgroup_reserve_data();
if (ret == -EDQUOT) {
/* Do something to free some qgroup space */
ret = btrfs_qgroup_reserve_data();
}
As if the first btrfs_qgroup_reserve_data() fails, it will free all
reserved qgroup space, so the next btrfs_qgroup_reserve_data() will
always success, and can go beyond qgroup limit.
[CAUSE]
This is mostly due to the fact that we don't have a good way to revert
changeset modification accurately.
Currently the changeset infrastructure is implemented using ulist, which
can only store two u64 values, used as start and length for each changed
extent range.
So we can't record which changed extent is modified in last
modification, thus unable to revert to previous status.
[FIX]
This patch will re-implement using pure rbtree, adding a new member,
changed_extent::seq, so we can remove changed extents which is
modified in previous modification.
This allows us to implement qgroup_revert(), which allow btrfs to revert
its modification to the io_tree.
I'm having a hard time groking what's going on here. These changesets
are limited to a [start, end] range correct?
Yes, but we may be only responsible for part of the changeset.
One example is we want to falloc range [0, 16K)
And [0, 4K), [8K, 12K) has already one existing file extent.
Then we succeeded in allocating space for [4K, 8K), but failed to
allocating space for [8K, 12K).
In that case, if we just return EDQUOT and clear the range for [4K, 8k)
and [8K, 12K), everything is completely fine.
But if we want to retry, then we should only clear the range for [8K,
12K), but not to clear [4K, 8K).
That's what we need for the next patch, just revert what we did in
previous set_extent_bit(), but not touching previously set bits.
Ok so how do we get to this case then? The changeset already has the range
we're responsible for correct? Why do we need the sequence number? Like we
should have a changeset for [4k, 8k) and one for [8k, 12k) right? Or are we
handed back the whole thing? If we fail _just_ for the [8k, 12k) part, do we
know that? Or do we just know our whole [0k, 16k) thing failed, and so we need
all this convoluted tracking to be able to retry for the range that we fucked
up, and this is what you are facilitating? Thanks,
Josef