On Mon, May 20, 2019 at 10:13 AM Nikolay Borisov <nborisov@xxxxxxxx> wrote:
>
>
>
> On 20.05.19 г. 12:10 ч., Filipe Manana wrote:
> > On Mon, May 20, 2019 at 9:23 AM Nikolay Borisov <nborisov@xxxxxxxx> wrote:
> >>
> >>
> >>
> >> On 20.05.19 г. 11:11 ч., Nikolay Borisov wrote:
> >>> This patch removes stray smp_mb before root->log_root from join_running_log_trans
> >>> as well as the unlocked check for root->log_root. log_root is only set in
> >>> btrfs_add_log_tree, called from start_log_trans under root->log_mutex.
> >>> Furthermore, log_root is freed in btrfs_free_log, called from commit_fs_root,
> >>> which in turn is called from transaction's critical section (TRANS_COMMIT_DOING).
> >>> Those 2 locking invariants ensure join_running_log_trans don't see invalid
> >>> values of ->log_root.
> >>>
> >>> Additionally this results in around 26% improvement when deleting 500k files/dir.
> >>> All values are in seconds.
> >>>
> >>> With Patch (real) With patch (sys) Without patch (real) Without patch (sys)
> >>> 80 78 91 90
> >>> 63 62 93 91
> >>> 65 64 92 90
> >>> 67 65 93 90
> >>> 75 73 90 88
> >>> 75 73 91 89
> >>> 75 73 93 90
> >>> 74 73 89 87
> >>> 76 74 91 89
> >>> stddev 5.76146200581454 5.45690184791497 1.42400062421959 1.22474487139159
> >>> mean 72.2222222222222 70.5555555555556 91.4444444444444 89.3333333333333
> >>> median 75 73 91 90
> >>>
> >> With Patch (real) With patch (sys) Without patch (real) Without patch (sys)
> >> 80 78 91 90
> >> 63 62 93 91
> >> 65 64 92 90
> >> 67 65 93 90
> >> 75 73 90 88
> >> 75 73 91 89
> >> 75 73 93 90
> >> 74 73 89 87
> >> 76 74 91 89
> >> stddev 5.76146200581454 5.45690184791497 1.42400062421959 1.22474487139159
> >> mean 72.2222222222222 70.5555555555556 91.4444444444444 89.3333333333333
> >> median 75 73 91 90
> >
> > Great.
> >
> > How was that test done?
> > Simply deleting the files with nothing else running in parallel?
>
> Yes
>
> >
> > How does it behave if while the files are being deleted, there are
> > concurrent fsyncs on other files of the same subvolume, that is, while
> > the mutex is held?
> >
> > Because that check without holding the mutex, is likely there for
> > performance reasons.
>
> So I will repeat the test when there are concurrent fsyncs running and
> also with no concurrent fsyncs running but just removing the smp_mb, I
> think the performance gain should be from removing the smp_mb and not
> necessarily the check. In the worst case I can leave the check intact
> and just remove the smp_mb because it doesn't add any value
> correctness-wise.
So the barrier is likely there to make sure we see non-null log_root
after it was set, concurrently by someone calling start_log_trans().
It pairs, implicitly, with the mutex_unlock there, at
start_log_trans(), since that implies a memory barrier.
With your patch, we always end up doing a mutex_lock(), which also
implies a memory barrier.
Sorry I missed this before, but your test doesn't make any sense. How
can a memory barrier have such a big impact on a code path (unlink)
that does lot of much heavier stuff? Like btree searches/deletes,
deleting delayed items, etc.
Your test case is even flawed because join_running_log_trans() can
never be called.
Look at its only two callers, btrfs_del_inode_ref_in_log() and
btrfs_del_dir_entries_in_log(), they both do:
if (dir->logged_trans < trans->transid)
return 0;
ret = join_running_log_trans(root);
if (ret)
return 0;
and
if (inode->logged_trans < trans->transid)
return 0;
ret = join_running_log_trans(root);
if (ret)
return 0;
Since you never fsync the files nor the directory containing them in
that test, we never end up calling join_running_log_trans().
So I don't know where you got the 26%...
Thanks.
>
> >
> > Thanks.
> >
> >
> >>
> >>
> >> Here's the perf data without being butchered.
> >>
> >>> Signed-off-by: Nikolay Borisov <nborisov@xxxxxxxx>
> >>> ---
> >>>
> >>> This passed full xfstest run and the performance results were obtained with the
> >>> following testcase:
> >>>
> >>> #!/bin/bash
> >>> for i in {1..10}; do
> >>> echo "Testun run : %i"
> >>> ./ltp/fsstress -z -d /media/scratch/ -f creat=100 -n 500000
> >>> sync
> >>> time rm -rf /media/scratch/*
> >>> sync
> >>> done
> >>>
> >>> fs/btrfs/tree-log.c | 4 ----
> >>> 1 file changed, 4 deletions(-)
> >>>
> >>> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> >>> index 6adcd8a2c5c7..61744d8af106 100644
> >>> --- a/fs/btrfs/tree-log.c
> >>> +++ b/fs/btrfs/tree-log.c
> >>> @@ -188,10 +188,6 @@ static int join_running_log_trans(struct btrfs_root *root)
> >>> {
> >>> int ret = -ENOENT;
> >>>
> >>> - smp_mb();
> >>> - if (!root->log_root)
> >>> - return -ENOENT;
> >>> -
> >>> mutex_lock(&root->log_mutex);
> >>> if (root->log_root) {
> >>> ret = 0;
> >>>
> >
> >
> >
> > --
> > Filipe David Manana,
> >
> > “Whether you think you can, or you think you can't — you're right.”
> >
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”