On Thu, Mar 08, 2018 at 05:47:20PM +0100, David Sterba wrote:
> On Thu, Mar 08, 2018 at 02:19:28PM +0200, Nikolay Borisov wrote:
> > > @@ -941,9 +941,7 @@ void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info)
> > > void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount)
> > > {
> > > percpu_counter_sub(&fs_info->bio_counter, amount);
> > > -
> > > - if (waitqueue_active(&fs_info->replace_wait))
> > > - wake_up(&fs_info->replace_wait);
> > > + cond_wake_up_nomb(&fs_info->replace_wait);
> >
> > nit/offtopic:
> >
> > I think here the code requires comments since we have 2 types of
> > waiters for fs_info->replace_wait. One is dependent on the
> > percpu_counter_sum (i.e. the btrfs_rm_dev_replace_blocked). And then
> > there is another condition on the same wait entry - the
> > btrfs_bio_counter_inc_blocked i.e:
> >
> > wait_event(fs_info->replace_wait,
> > !test_bit(BTRFS_FS_STATE_DEV_REPLACING,
> > &fs_info->fs_state));
> >
> > geez, who would come up with such synchronization ...
>
> 'git blame' is the right tool, I'm not a fan of the dev-replace locking
> either.
>
> > > }
> > >
> > > void btrfs_bio_counter_inc_blocked(struct btrfs_fs_info *fs_info)
> > > @@ -3092,11 +3086,8 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
> > > atomic_set(&log_root_tree->log_commit[index2], 0);
> > > mutex_unlock(&log_root_tree->log_mutex);
> > >
> > > - /*
> > > - * The barrier before waitqueue_active is implied by mutex_unlock
> > > - */
> > > - if (waitqueue_active(&log_root_tree->log_commit_wait[index2]))
> > > - wake_up(&log_root_tree->log_commit_wait[index2]);
> > > + /* The barrier is implied by mutex_unlock */
> > > + cond_wake_up_nomb(&log_root_tree->log_commit_wait[index2]);
> >
> > I think this is wrong (not your code) but the original assumption that
> > the RELEASE semantics provided by mutex_unlock is sufficient.
>
> That was added in 33a9eca7e4a4c2c17ae by me.
>
> > According to memory-barriers.txt:
> >
> > Section 'LOCK ACQUISITION FUNCTIONS' states:
> >
> >
> > (2) RELEASE operation implication:
> >
> > Memory operations issued before the RELEASE will be completed before the
> > RELEASE operation has completed.
> >
> > Memory operations issued after the RELEASE *may* be completed before the
> > RELEASE operation has completed.
> >
> > (I've bolded the may portion)
> >
> > The example given there:
> >
> > As an example, consider the following:
> >
> > *A = a;
> > *B = b;
> > ACQUIRE
> > *C = c;
> > *D = d;
> > RELEASE
> > *E = e;
> > *F = f;
> >
> > The following sequence of events is acceptable:
> >
> > ACQUIRE, {*F,*A}, *E, {*C,*D}, *B, RELEASE
> >
> > So if we assume that *C is modifying the flag which the waitqueue is checking,
> > and *E is the actual wakeup, then those accesses can be re-ordered...
> >
> > IMHO this code should be considered broken...
>
> Maybe, but the code has to be taken as a whole, the log_mutex and
> waiting does not follow a simple pattern that could be matched on the
> barriers example. In this case I think that the log_commit_wait is
> partially synchronized by the log_mutex and the bad scenario of missed
> wakeup will not happen. We'll have either empty waiter queue, or the
> waiters are blocked on the mutex (ie. the waitqueu is active and we'll
> always have someone to wake). But this is an observation made after a
> short time reading the code so there actually might be a tiny window
> where some of the assumptions do not hold.
>
> Point of the patch is to do the transformation to the helpers. If there
> are bugs that could be obscured by the patch, I can postpone it until
> the bugs are fixed.
For the record, the 2 patches will be postponed until the barrier is
resolved. We could add an explicit one even if it's not totally
necessary, but reasoning why it's not needed seems to be hard.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html