On 29.10.19 г. 19:51 ч., David Sterba wrote:
> On Wed, Oct 23, 2019 at 12:57:00PM +0300, Nikolay Borisov wrote:
>>
>>
>> On 17.10.19 г. 22:39 ч., David Sterba wrote:
>>> There's one potential memory ordering problem regarding
>>> eb::blocking_writers, although this hasn't been hit in practice. On TSO
>>> (total store ordering) architectures, the writes will always be seen in
>>> right order.
>>>
>>> In case the calls in btrfs_set_lock_blocking_write are seen in this
>>> (wrong) order on another CPU:
>>>
>>> 0: /* blocking_writers == 0 */
>>> 1: write_unlock()
>>> 2: blocking_writers = 1
>>>
>>> btrfs_try_tree_read_lock, btrfs_tree_read_lock_atomic or
>>> btrfs_try_tree_write_lock would have to squeeze all of this between 1:
>>> and 2:
>>
>> This is only a problem for unlocked (optimistic) accesses in those
>> functions. Simply because from its POV the eb->lock doesn't play any
>> role e.g. they don't know about it at all.
>
> No the lock is important here. Let's take btrfs_try_tree_read_lock as
> example for 2nd thread:
>
> T1: T2:
> write_unlock
> blocking_writers == 0
> try_readlock (success, locked)
> if (blocking_writers) return -> not taken
> blocking_writers = 1
>
> Same for btrfs_tree_read_lock_atomic (with read_lock) and
> btrfs_try_tree_write_lock (with write_lock)
>
> IMO it's the other way around than you say, the optimistic lock is
> irrelevant. We need the blocking_writers update sneak into a locked
> section.
But write_unlock is a release operation, as per memory-barriers.txt:
(6) RELEASE operations.
This also acts as a one-way permeable barrier. It guarantees that
all memory operations before the RELEASE operation will appear to
happen before the RELEASE operation with respect to the other
components of the system. RELEASE operations include UNLOCK operations
and smp_store_release() operations.
...
The use of ACQUIRE and RELEASE operations generally precludes the need
for other sorts of memory barrier (but note the exceptions mentioned in
the subsection "MMIO write barrier"). In addition, a RELEASE+ACQUIRE
pair is -not- guaranteed to act as a full memory barrier. However,
after an ACQUIRE on a given variable, all memory accesses preceding any
prior RELEASE on that same variable are guaranteed to be visible.
try_readlock is an ACQUIRE operation w.r.t the lock. So if it succeeds
then all previous accesses e.g. blocking_writer = 1 are guaranteed to be
visible because we have RELEASE (write_unlock ) + ACQUIRE (try_readlock
in case of success).
So the blocking_write check AFTER try_readlock has succeeded is
guaranteed to see the update to blocking_writers in
btrfs_set_lock_blocking_write.
>
>> This implies there needs to be yet another synchronization/ordering
>> mechanism only for blocking_writer. But then further down you say that
>> there are no read side barrier because observing the accesses in a
>> particular order is not important for correctness.
>>
>> Given this I fail to see what this smp_wmb affects ordering.
>
> So in the steps above:
>
> T1: T2:
> blocking_writers = 1
> smp_mb()
> write_unlock
> blocking_writers == 1
> -> take the fast path and return
>
>>> + /*
>>> + * Writers must be be updated before the lock is released so
>>> + * that other threads see that in order. Otherwise this could
>>> + * theoretically lead to blocking_writers be still set to 1 but
>>> + * this would be unexpected eg. for spinning locks.
>>> + *
>>> + * There are no pairing read barriers for unlocked access as we
>>> + * don't strictly need them for correctness (eg. in
>>> + * btrfs_try_tree_read_lock), and the unlock/lock is an implied
>>> + * barrier in other cases.
>>> + */
>>> + smp_wmb();
>>> write_unlock(&eb->lock);
>>
>> That comment sounds to me as if you only care about the readers of
>> blocking_writers _after_ they have acquired the eb::lock for reading. In
>> this case this smp_wmb is pointless because write_unlock/write_lock
>> imply release/acquire semantics.
>
> The pairing read side barrier would have to be before all reads of
> blocking_writers, eg.
>
> 180 int btrfs_try_tree_read_lock(struct extent_buffer *eb)
> 181 {
>
> smp_rmb();
>
> 182 if (eb->blocking_writers)
> 183 return 0;
>
> ant it won't be wrong, the value would be most up to date as it could be. And
> givent that this is an optimistic shortcut, lack of synchronized read might
> pessimize that. Ie. blocking_writers is already 0 but the update is not
> propagated and btrfs_try_tree_read_lock returns.
>
> This is in principle indistinguishable from a state where the lock is
> still locked. Hence the 'not needed for correctness' part. And then the
> barrier can be avoided.
Nod
>
> The write side needs the barrier to order write and unlock. The
> unlock/lock implied barrier won't help as the read-side check happens in
> the middle of that.
>
But the unlock (as explained earlier) won't allow the write to 'leak'
past it.