Re: [PATCH 4/5] btrfs: serialize blocking_writers updates

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 17.10.19 г. 22:39 ч., David Sterba wrote:
> There's one potential memory ordering problem regarding
> eb::blocking_writers, although this hasn't been hit in practice. On TSO
> (total store ordering) architectures, the writes will always be seen in
> right order.
> 
> In case the calls in btrfs_set_lock_blocking_write are seen in this
> (wrong) order on another CPU:
> 
> 0:  /* blocking_writers == 0 */
> 1:  write_unlock()
> 2:  blocking_writers = 1
> 
> btrfs_try_tree_read_lock, btrfs_tree_read_lock_atomic or
> btrfs_try_tree_write_lock would have to squeeze all of this between 1:
> and 2:

This is only a problem for unlocked (optimistic) accesses in those
functions. Simply because from its POV the eb->lock doesn't play any
role e.g. they don't know about it at all.

This implies there needs to be yet another synchronization/ordering
mechanism only for blocking_writer. But then further down you say that
there are no read side barrier because observing the accesses in a
particular order is not important for correctness.

Given this I fail to see what this smp_wmb affects ordering.
> 
> - check if blocking_writers is 0 (it is, continue)
> - try read lock, read lock or write lock (succeeds after 1:)
> - check blocking_writers again (continue)
> 
> All of that assumes that the reordered writes can survive for quite some
> time (unlikely if its in the internal store buffer).
> 
> Another point against observing that in practice is that
> blocking_writers and the lock are on the same cacheline (64B), so it's
> more likely both values are stored in order, or some sort of pending
> write flush will update blocking_writers, rwlock before the try lock
> happens. Assuming the CPUs work like that and don't hide other
> surprises.
> 
> struct extent_buffer {
> 	u64                        start;                /*     0     8 */
> 	long unsigned int          len;                  /*     8     8 */
> 	long unsigned int          bflags;               /*    16     8 */
> 	struct btrfs_fs_info *     fs_info;              /*    24     8 */
> 	spinlock_t                 refs_lock;            /*    32     4 */
> 	atomic_t                   refs;                 /*    36     4 */
> 	atomic_t                   io_pages;             /*    40     4 */
> 	int                        read_mirror;          /*    44     4 */
> 	struct callback_head callback_head __attribute__((__aligned__(8))); /*    48    16 */
> 	/* --- cacheline 1 boundary (64 bytes) --- */
> 	pid_t                      lock_owner;           /*    64     4 */
> 	int                        blocking_writers;     /*    68     4 */
> 	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> 	atomic_t                   blocking_readers;     /*    72     4 */
> 	bool                       lock_nested;          /*    76     1 */
> 
> 	/* XXX 1 byte hole, try to pack */
> 
> 	short int                  log_index;            /*    78     2 */
> 	rwlock_t                   lock;                 /*    80     8 */
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> 	wait_queue_head_t          write_lock_wq;        /*    88    24 */
> 	wait_queue_head_t          read_lock_wq;         /*   112    24 */
> 	/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
> 	struct page *              pages[16];            /*   136   128 */
> 
> 	/* size: 264, cachelines: 5, members: 18 */
> 	/* sum members: 263, holes: 1, sum holes: 1 */
> 	/* forced alignments: 1 */
> 	/* last cacheline: 8 bytes */
> } __attribute__((__aligned__(8)));
> 
> Add the barriers for correctness sake.
> 
> Signed-off-by: David Sterba <dsterba@xxxxxxxx>
> ---
>  fs/btrfs/locking.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
> index a12f3edc3505..e0e0430577aa 100644
> --- a/fs/btrfs/locking.c
> +++ b/fs/btrfs/locking.c
> @@ -110,6 +110,18 @@ void btrfs_set_lock_blocking_write(struct extent_buffer *eb)
>  		btrfs_assert_spinning_writers_put(eb);
>  		btrfs_assert_tree_locked(eb);
>  		WRITE_ONCE(eb->blocking_writers, 1);
> +		/*
> +		 * Writers must be be updated before the lock is released so
> +		 * that other threads see that in order. Otherwise this could
> +		 * theoretically lead to blocking_writers be still set to 1 but
> +		 * this would be unexpected eg. for spinning locks.
> +		 *
> +		 * There are no pairing read barriers for unlocked access as we
> +		 * don't strictly need them for correctness (eg. in
> +		 * btrfs_try_tree_read_lock), and the unlock/lock is an implied
> +		 * barrier in other cases.
> +		 */
> +		smp_wmb();
>  		write_unlock(&eb->lock);

That comment sounds to me as if you only care about the readers of
blocking_writers _after_ they have acquired the eb::lock for reading. In
this case this smp_wmb is pointless because write_unlock/write_lock
imply release/acquire semantics.

Unlocked readers are not affected by this wmb.

>  	}
>  }
> @@ -316,7 +328,9 @@ void btrfs_tree_unlock(struct extent_buffer *eb)
>  		/*
>  		 * We need to order modifying blocking_writers above with
>  		 * actually waking up the sleepers to ensure they see the
> -		 * updated value of blocking_writers
> +		 * updated value of blocking_writers.
> +		 *
> +		 * cond_wake_up calls smp_mb
>  		 */
>  		cond_wake_up(&eb->write_lock_wq);
>  	} else {
> 



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux