On 5/3/18 2:23 AM, Nikolay Borisov wrote:
>
>
> On 3.05.2018 00:11, jeffm@xxxxxxxx wrote:
>> From: Jeff Mahoney <jeffm@xxxxxxxx>
>>
>> Hi Dave -
>>
>> Here's the updated patchset for the rescan races. This fixes the issue
>> where we'd try to start multiple workers. It introduces a new "ready"
>> bool that we set during initialization and clear while queuing the worker.
>> The queuer is also now responsible for most of the initialization.
>>
>> I have a separate patch set start that gets rid of the racy mess surrounding
>> the rescan worker startup. We can handle it in btrfs_run_qgroups and
>> just set a flag to start it everywhere else.
> I'd be interested in seeing those patches. Some time ago I did send a
> patch which cleaned up the way qgroup rescan was initiated. It was done
> from "btrfs_run_qgroups" and I think this is messy. Whatever we do we
> ought to really have well-defined semantics when qgroups rescan are run,
> preferably we shouldn't be conflating rescan + run (unless there is
> _really_ good reason to do). In the past the rescan from scan was used
> only during qgroup enabling.
I think btrfs_run_qgroups is the place to do it. Here's why:
2773 int
2774 btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
2775 {
2776 int ret = 0;
2777 struct btrfs_trans_handle *trans;
2778
2779 ret = qgroup_rescan_init(fs_info, 0, 1);
2780 if (ret)
2781 return ret;
2782
2783 /*
2784 * We have set the rescan_progress to 0, which means no more
2785 * delayed refs will be accounted by btrfs_qgroup_account_ref.
2786 * However, btrfs_qgroup_account_ref may be right after its call
2787 * to btrfs_find_all_roots, in which case it would still do the
2788 * accounting.
2789 * To solve this, we're committing the transaction, which will
2790 * ensure we run all delayed refs and only after that, we are
2791 * going to clear all tracking information for a clean start.
2792 */
2793
2794 trans = btrfs_join_transaction(fs_info->fs_root);
2795 if (IS_ERR(trans)) {
2796 fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
2797 return PTR_ERR(trans);
2798 }
2799 ret = btrfs_commit_transaction(trans);
2800 if (ret) {
2801 fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
2802 return ret;
2803 }
2804
2805 qgroup_rescan_zero_tracking(fs_info);
2806
2807 queue_rescan_worker(fs_info);
2808 return 0;
2809 }
The delayed ref race should exist anywhere we initiate a rescan outside of
initially enabling qgroups. We already zero the tracking and queue the rescan
worker in btrfs_run_qgroups for when we enable qgroups. Why not just always
queue the worker there so the initialization and execution has a clear starting point?
There are a few other races I'd like to fix as well. We call btrfs_run_qgroups
directly from btrfs_ioctl_qgroup_assign, which is buggy since
btrfs_add_qgroup_relation only checks to see if the quota_root exists. It will
exist as soon as btrfs_quota_enable runs but we won't have committed the
transaction yet. The call will end up enabling quotas in the middle of a transaction.
-Jeff
--
Jeff Mahoney
SUSE Labs
Attachment:
signature.asc
Description: OpenPGP digital signature
