On 7/3/20 2:19 AM, Qu Wenruo wrote:
[PROBLEM]
Before this patch, when btrfs_qgroup_reserve_data() fails, we free all
reserved space of the changeset.
This means the following call is not possible:
ret = btrfs_qgroup_reserve_data();
if (ret == -EDQUOT) {
/* Do something to free some qgroup space */
ret = btrfs_qgroup_reserve_data();
}
As if the first btrfs_qgroup_reserve_data() fails, it will free all
reserved qgroup space, so the next btrfs_qgroup_reserve_data() will
always success, and can go beyond qgroup limit.
[CAUSE]
This is caused by the fact that we didn't expect to call
btrfs_qgroup_reserve_data() again after error.
Thus btrfs_qgroup_reserve_data() frees all its reserved space.
[FIX]
This patch will implement a new function, qgroup_revert(), to iterate
through the ulist nodes, to find any nodes in the failure range, and
remove the EXTENT_QGROUP_RESERVED bits from the io_tree, and decrease
the extent_changeset::bytes_changed, so that we can revert to previous
status.
This allows later patches to retry btrfs_qgroup_reserve_data() if EDQUOT
happens.
Suggested-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
I spent like 10 minutes reading all this trying to figure out why we need this
new thing. What you are saying is in the case of say fallocate, where you have
something like
while (cur_len < total_len) {
ret = btrfs_qgroup_reserve_data(&changeset);
if (ret == -EDQUOT) {
/* Flush */
ret = btrfs_qgroup_reserve_data(&changeset);
}
}
then doing the free that was in btrfs_qgroup_reserve_data() could free up space
that we had already actually used. So the problem isn't the flushing +
subsequent retry, it's that we could have previous reservations on that
changeset that _were_ successful, and previously we would free all of it which
would allow us to go over the limit. That's not clear at all from the commit
message. That being said the code is fine. Thanks,
Josef