On 2018/9/7 下午5:32, Qu Wenruo wrote: > This patchset can be fetched from github: > https://github.com/adam900710/linux/tree/qgroup_balance_skip_trees > The base commit is v4.19-rc1 tag. > > There are a lot of reports of system hang for balance on quota enabled > fs. > It's most obvious for large fs. > > The hang is caused by tons of unmodified extents marked as qgroup dirty. > Such unmodified/unrelated sources include: > 1) Unmodified subtree > 2) Subtree drop for reloc tree > (BTW, other sources includes unmodified file extent items) > > E.g. > OO = Old tree blocks from file tree > NN = New tree blocks from reloc tree > > file tree reloc tree > OO (a) NN (a) > / \ / \ > (b) OO OO (c) (b) NN NN (c) > / \ / \ / \ / \ > OO OO OO OO OO OO OO NN > (d) (e) (f) (g) (d) (e) (f) (g) > > In above case, balance will modify nodeptr in OO(a) to point NN(b) and > NN(c), and modify NN(a) to point to OO(B) and OO(c). > > Before this patch, quota will mark the whole subtree from its parent > down to the leaves as dirty. > So btrfs quota need to trace all tree block from (a) to (g). > > However tree blocks (d) (e) (f) are shared between both trees, thus > there is no need to trace those 3 tree blocks. > > This patchset will change how this work by only tracing modified tree > blocks in reloc tree, and their counter parts in file tree. > > Nodeptr swap will happen for tree blocks (b) and (c) in both tree. > > For tree block (b), in reloc tree we could find that all its > children's generation is smaller than last_snapshot, thus no need to > trace them, only need to trace NN(b), and its counter part OO(b). > > For tree block (c), in reloc tree, we find its child NN(g) need > tracing, and for tree block NN(g), there is no child need to trace. > > So for subtree starting at tree block NN(c), we need to trace NN(c) and > NN(g), along with its counter part OO(c) and OO(c). > > With this patch, we could skip tree blocks OO(d)~OO(f) in above example, > thus reduce some some overhead caused by qgroup. > > The improvement is mostly related to metadata relocation. > If there is some high level tree blocks get relocated but its children are > still unmodified, we could save a lot of time. > > Even for the worst case, it should be no worse than original full > subtree marking method. > > Real world case benchmark is under way. Did a small scale test. (With latest submitted patch "btrfs: delayed-ref: Introduce new parameter for btrfs_add_delayed_tree_ref() to reduce unnecessary qgroup tracing") 4K nodesize fs (to bump tree sizes), around 4G data copied from /usr and /lib (so number of files should be large enough). The VM has unsafe cache mode for its qcow2 file, and the backing device is a SAMSUNG 850 evo sata SSD. (Host has enough RAM so most IO should be as fast as RAM speed). The for metadata only balance: | Before | After | Diff -------------------------------------------------------------------------- relocated extents | 21112 | 22916 | +8.5% qgroup dirty extents | 213831 | 140731 | -30.0% time (sys) | 7.828s | 5.818s | -25.7% time (real) | 10.004s | 7.768s | -22.3% I'll report back with even larger fs with more subvolumes/snapshots. Thanks, Qu > > Changelog: > v2: > Rename "tree reloc tree" to "reloc tree". > Add patch "Don't trace subtree if we're dropping reloc tree" into the > patchset. > Fix wrong btrfs_bin_search() call, which leads to unexpected ENOENT > error for btrfs_qgroup_trace_extent_swap(). Now use dst_path->slots[] > directly. > > Qu Wenruo (5): > btrfs: qgroup: Introduce trace event to analyse the number of dirty > extents accounted > btrfs: qgroup: Introduce function to trace two swaped extents > btrfs: qgroup: Introduce function to find all new tree blocks of reloc > tree > btrfs: qgroup: Use generation aware subtree swap to mark dirty extents > btrfs: qgroup: Don't trace subtree if we're dropping reloc tree > > fs/btrfs/extent-tree.c | 8 +- > fs/btrfs/qgroup.c | 338 +++++++++++++++++++++++++++++++++++ > fs/btrfs/qgroup.h | 10 ++ > fs/btrfs/relocation.c | 11 +- > include/trace/events/btrfs.h | 21 +++ > 5 files changed, 379 insertions(+), 9 deletions(-) >
Attachment:
signature.asc
Description: OpenPGP digital signature
