On Tue, Sep 24, 2019 at 3:07 PM Nikolay Borisov <nborisov@xxxxxxxx> wrote: > > > > On 24.09.19 г. 12:49 ч., fdmanana@xxxxxxxxxx wrote: > > From: Filipe Manana <fdmanana@xxxxxxxx> > > > > There is a race between setting up a qgroup rescan worker and completing > > a qgroup rescan worker that can lead to callers of the qgroup rescan wait > > ioctl to either not wait for the rescan worker to complete or to hang > > forever due to missing wake ups. The following diagram shows a sequence > > of steps that illustrates the race. > > > > CPU 1 CPU 2 CPU 3 > > > > btrfs_ioctl_quota_rescan() > > btrfs_qgroup_rescan() > > qgroup_rescan_init() > > mutex_lock(&fs_info->qgroup_rescan_lock) > > spin_lock(&fs_info->qgroup_lock) > > > > fs_info->qgroup_flags |= > > BTRFS_QGROUP_STATUS_FLAG_RESCAN > > > > init_completion( > > &fs_info->qgroup_rescan_completion) > > > > fs_info->qgroup_rescan_running = true > > > > mutex_unlock(&fs_info->qgroup_rescan_lock) > > spin_unlock(&fs_info->qgroup_lock) > > > > btrfs_init_work() > > --> starts the worker > > > > btrfs_qgroup_rescan_worker() > > mutex_lock(&fs_info->qgroup_rescan_lock) > > > > fs_info->qgroup_flags &= > > ~BTRFS_QGROUP_STATUS_FLAG_RESCAN > > > > mutex_unlock(&fs_info->qgroup_rescan_lock) > > > > starts transaction, updates qgroup status > > item, etc > > > > btrfs_ioctl_quota_rescan() > > btrfs_qgroup_rescan() > > qgroup_rescan_init() > > mutex_lock(&fs_info->qgroup_rescan_lock) > > spin_lock(&fs_info->qgroup_lock) > > > > fs_info->qgroup_flags |= > > BTRFS_QGROUP_STATUS_FLAG_RESCAN > > > > init_completion( > > &fs_info->qgroup_rescan_completion) > > > > fs_info->qgroup_rescan_running = true > > > > mutex_unlock(&fs_info->qgroup_rescan_lock) > > spin_unlock(&fs_info->qgroup_lock) > > > > btrfs_init_work() > > --> starts another worker > > > > mutex_lock(&fs_info->qgroup_rescan_lock) > > > > fs_info->qgroup_rescan_running = false > > > > mutex_unlock(&fs_info->qgroup_rescan_lock) > > > > complete_all(&fs_info->qgroup_rescan_completion) > > > > Before the rescan worker started by the task at CPU 3 completes, if another > > task calls btrfs_ioctl_quota_rescan(), it will get -EINPROGRESS because the > > flag BTRFS_QGROUP_STATUS_FLAG_RESCAN is set at fs_info->qgroup_flags, which > > is expected and correct behaviour. > > > > However if other task calls btrfs_ioctl_quota_rescan_wait() before the > > rescan worker started by the task at CPU 3 completes, it will return > > immediately without waiting for the new rescan worker to complete, > > because fs_info->qgroup_rescan_running is set to false by CPU 2. > > > > This race is making test case btrfs/171 (from fstests) to fail often: > > > > btrfs/171 9s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad) > > --- tests/btrfs/171.out 2018-09-16 21:30:48.505104287 +0100 > > +++ /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad 2019-09-19 02:01:36.938486039 +0100 > > @@ -1,2 +1,3 @@ > > QA output created by 171 > > +ERROR: quota rescan failed: Operation now in progress > > Silence is golden > > ... > > (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/171.out /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad' to see the entire diff) > > > > That is because the test calls the btrfs-progs commands "qgroup quota > > rescan -w", "qgroup assign" and "qgroup remove" in a sequence that makes > > calls to the rescan start ioctl fail with -EINPROGRESS (note the "btrfs" > > commands 'qgroup assign' and 'qgroup remove' often call the rescan start > > ioctl after calling the qgroup assign ioctl, btrfs_ioctl_qgroup_assign()), > > since previous waits didn't actually wait for a rescan worker to complete. > > > > Another problem the race can cause is missing wake ups for waiters, since > > the call to complete_all() happens outside a critical section and after > > clearing the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN. In the sequence diagram > > above, if we have a waiter for the first rescan task (executed by CPU 2), > > then fs_info->qgroup_rescan_completion.wait is not empty, and if after the > > rescan worker clears BTRFS_QGROUP_STATUS_FLAG_RESCAN and before it calls > > complete_all() against fs_info->qgroup_rescan_completion, the task at CPU 3 > > calls init_completion() against fs_info->qgroup_rescan_completion which > > re-initilizes its wait queue to an empty queue, therefore causing the > > rescan worker at CPU 2 to call complete_all() against an empty queue, never > > waking up the task waiting for that rescan worker. > > > > Fix this by clearing BTRFS_QGROUP_STATUS_FLAG_RESCAN and setting > > fs_info->qgroup_rescan_running to false in the same critical section, > > delimited by the mutex fs_info->qgroup_rescan_lock, as well as doing > > Why do we need both the RESCAN flag and qgroup_rescan_running having > them both and having btrfs_qgroup_wait_for_completion rely on > qgroup[_rescan_running just makes it easier to screw up. IMO it makes > sense to remove qgroup_rescan_running in this patch as well and the code > should only use RESCAN flag. No, they are both needed. Otherwise the issue fixed by the commit mentioned in the second "Fixes:" tag is re-introduced. >
