Re: [PATCH v2 0/5] btrfs: qgroup: Skip unrelated tree blocks for balance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2018/9/7 下午5:32, Qu Wenruo wrote:
> This patchset can be fetched from github:
> https://github.com/adam900710/linux/tree/qgroup_balance_skip_trees
> The base commit is v4.19-rc1 tag.
> 
> There are a lot of reports of system hang for balance on quota enabled
> fs.
> It's most obvious for large fs.
> 
> The hang is caused by tons of unmodified extents marked as qgroup dirty.
> Such unmodified/unrelated sources include:
> 1) Unmodified subtree
> 2) Subtree drop for reloc tree
> (BTW, other sources includes unmodified file extent items)
> 
> E.g.
> OO = Old tree blocks from file tree
> NN = New tree blocks from reloc tree
> 
>         file tree                              reloc tree
>            OO (a)                                  NN (a)
>           /  \                                    /  \
>     (b) OO    OO (c)                        (b) NN    NN (c)
>        / \   / \                               / \   / \
>      OO  OO OO  OO                           OO  OO OO  NN
>     (d) (e) (f) (g)                         (d) (e) (f) (g)
> 
> In above case, balance will modify nodeptr in OO(a) to point NN(b) and
> NN(c), and modify NN(a) to point to OO(B) and OO(c).
> 
> Before this patch, quota will mark the whole subtree from its parent
> down to the leaves as dirty.
> So btrfs quota need to trace all tree block from (a) to (g).
> 
> However tree blocks (d) (e) (f) are shared between both trees, thus
> there is no need to trace those 3 tree blocks.
> 
> This patchset will change how this work by only tracing modified tree
> blocks in reloc tree, and their counter parts in file tree.
> 
> Nodeptr swap will happen for tree blocks (b) and (c) in both tree.
> 
> For tree block (b), in reloc tree we could find that all its
> children's generation is smaller than last_snapshot, thus no need to
> trace them, only need to trace NN(b), and its counter part OO(b).
> 
> For tree block (c), in reloc tree, we find its child NN(g) need
> tracing, and for tree block NN(g), there is no child need to trace.
> 
> So for subtree starting at tree block NN(c), we need to trace NN(c) and
> NN(g), along with its counter part OO(c) and OO(c).
> 
> With this patch, we could skip tree blocks OO(d)~OO(f) in above example,
> thus reduce some some overhead caused by qgroup.
> 
> The improvement is mostly related to metadata relocation.
> If there is some high level tree blocks get relocated but its children are
> still unmodified, we could save a lot of time.
> 
> Even for the worst case, it should be no worse than original full
> subtree marking method.
> 
> Real world case benchmark is under way.

Did a small scale test. (With latest submitted patch "btrfs:
delayed-ref: Introduce new parameter for btrfs_add_delayed_tree_ref() to
reduce unnecessary qgroup tracing")

4K nodesize fs (to bump tree sizes), around 4G data copied from /usr and
/lib (so number of files should be large enough).

The VM has unsafe cache mode for its qcow2 file, and the backing device
is a SAMSUNG 850 evo sata SSD. (Host has enough RAM so most IO should be
as fast as RAM speed).

The for metadata only balance:

                     | Before          | After       | Diff
--------------------------------------------------------------------------
relocated extents    | 21112           | 22916       | +8.5%
qgroup dirty extents | 213831          | 140731      | -30.0%
time (sys)           | 7.828s          | 5.818s      | -25.7%
time (real)          | 10.004s         | 7.768s      | -22.3%

I'll report back with even larger fs with more subvolumes/snapshots.

Thanks,
Qu

> 
> Changelog:
> v2:
>   Rename "tree reloc tree" to "reloc tree".
>   Add patch "Don't trace subtree if we're dropping reloc tree" into the
>   patchset.
>   Fix wrong btrfs_bin_search() call, which leads to unexpected ENOENT
>   error for btrfs_qgroup_trace_extent_swap(). Now use dst_path->slots[]
>   directly.
> 
> Qu Wenruo (5):
>   btrfs: qgroup: Introduce trace event to analyse the number of dirty
>     extents accounted
>   btrfs: qgroup: Introduce function to trace two swaped extents
>   btrfs: qgroup: Introduce function to find all new tree blocks of reloc
>     tree
>   btrfs: qgroup: Use generation aware subtree swap to mark dirty extents
>   btrfs: qgroup: Don't trace subtree if we're dropping reloc tree
> 
>  fs/btrfs/extent-tree.c       |   8 +-
>  fs/btrfs/qgroup.c            | 338 +++++++++++++++++++++++++++++++++++
>  fs/btrfs/qgroup.h            |  10 ++
>  fs/btrfs/relocation.c        |  11 +-
>  include/trace/events/btrfs.h |  21 +++
>  5 files changed, 379 insertions(+), 9 deletions(-)
> 

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux