Re: [PATCH 3/9] btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/7/17 7:36 PM, Qu Wenruo wrote:
> 
> 
> At 03/08/2017 03:21 AM, Jeff Mahoney wrote:
>> On 2/27/17 2:10 AM, Qu Wenruo wrote:
>>> [BUG]
>>> The easist way to reproduce the bug is:
>>> ------
>>>  # mkfs.btrfs -f $dev -n 16K
>>>  # mount $dev $mnt -o inode_cache
>>>  # btrfs quota enable $mnt
>>>  # btrfs quota rescan -w $mnt
>>>  # btrfs qgroup show $mnt
>>> qgroupid         rfer         excl
>>> --------         ----         ----
>>> 0/5          32.00KiB     32.00KiB
>>>              ^^ Twice the correct value
>>> ------
>>>
>>> And fstests/btrfs qgroup test group can easily detect them with
>>> inode_cache mount option.
>>> Although some of them are false alerts since old test cases are using
>>> fixed golden output.
>>> While new test cases will use "btrfs check" to detect qgroup mismatch.
>>>
>>> [CAUSE]
>>> Inode_cache mount option will make commit_fs_roots() to call
>>> btrfs_save_ino_cache() to update fs/subvol trees, and generate new
>>> delayed refs.
>>>
>>> However we call btrfs_qgroup_prepare_account_extents() too early, before
>>> commit_fs_roots().
>>> This makes the "old_roots" for newly generated extents are always NULL.
>>> For freeing extent case, this makes both new_roots and old_roots to be
>>> empty, while correct old_roots should not be empty.
>>> This causing qgroup numbers not decreased correctly.
>>>
>>> [FIX]
>>> Modify the timing of calling btrfs_qgroup_prepare_account_extents() to
>>> just before btrfs_qgroup_account_extents(), and add needed delayed_refs
>>> handler.
>>> So qgroup can handle inode_map mount options correctly.
>>>
>>> Signed-off-by: Qu Wenruo <quwenruo@xxxxxxxxxxxxxx>
>>
>> Could we also fix this by excepting the free space inode from qgroups?
>> It seems like this is something we'd want to do anyway unless we want to
>> handle EDQUOT there too.
> 
> I'm afraid it's not possible here.

Not within the context of this patch set, but I was thinking we could
just never do the tracing up front so there'd be nothing to check later.
 I still think this is probably the right approach for non-fsroots but
it does seem like there's no way to do it easily on a per-inode basis.

> As qgroup accounts any tree and data blocks that belong to specified
> tree(fs and subvolumes), not caring the inode it belongs to.
> 
> And the design of free inode space cache is to restore it in
> fs/subvolume tree, which has no difference with normal inode, except it
> doesn't have INODE_REF and its ino is FREE_INO.
> (Much the same behavior for space cache inode)
> 
> The correct solution for caching free space and inode should be the new
> space cache tree, which puts all these info into their own tree, never
> affecting existing trees.

Agreed.

> The only good news is, inode_cache is not commonly used and IIRC has
> bugs. Maybe it will be good idea to depreciate the option?

I wouldn't object to that.

-Jeff

>>> ---
>>>  fs/btrfs/transaction.c | 25 ++++++++++++++++++-------
>>>  1 file changed, 18 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>>> index 6b3e0fc2fe7a..1ff3ec797356 100644
>>> --- a/fs/btrfs/transaction.c
>>> +++ b/fs/btrfs/transaction.c
>>> @@ -2130,13 +2130,6 @@ int btrfs_commit_transaction(struct
>>> btrfs_trans_handle *trans)
>>>          goto scrub_continue;
>>>      }
>>>
>>> -    /* Reocrd old roots for later qgroup accounting */
>>> -    ret = btrfs_qgroup_prepare_account_extents(trans, fs_info);
>>> -    if (ret) {
>>> -        mutex_unlock(&fs_info->reloc_mutex);
>>> -        goto scrub_continue;
>>> -    }
>>> -
>>>      /*
>>>       * make sure none of the code above managed to slip in a
>>>       * delayed item
>>> @@ -2179,6 +2172,24 @@ int btrfs_commit_transaction(struct
>>> btrfs_trans_handle *trans)
>>>      btrfs_free_log_root_tree(trans, fs_info);
>>>
>>>      /*
>>> +     * commit_fs_roots() can call btrfs_save_ino_cache(), which
>>> generates
>>> +     * new delayed refs. Must handle them or qgroup can be wrong.
>>> +     */
>>> +    ret = btrfs_run_delayed_refs(trans, fs_info, (unsigned long)-1);
>>> +    if (ret) {
>>> +        mutex_unlock(&fs_info->tree_log_mutex);
>>> +        mutex_unlock(&fs_info->reloc_mutex);
>>> +        goto scrub_continue;
>>> +    }
>>> +
>>> +    ret = btrfs_qgroup_prepare_account_extents(trans, fs_info);
>>> +    if (ret) {
>>> +        mutex_unlock(&fs_info->tree_log_mutex);
>>> +        mutex_unlock(&fs_info->reloc_mutex);
>>> +        goto scrub_continue;
>>> +    }
>>> +
>>> +    /*
>>>       * Since fs roots are all committed, we can get a quite accurate
>>>       * new_roots. So let's do quota accounting.
>>>       */
>>>
>>
>>
> 
> 
> 


-- 
Jeff Mahoney
SUSE Labs


Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux