On 20.01.2020 12:30 Martin Raiber wrote: > On 20.01.2020 11:57 Filipe Manana wrote: >> On Sat, Jan 18, 2020 at 6:42 PM Martin Raiber <martin@xxxxxxxxxxxx> wrote: >>> On 26.11.2019 18:52 Martin Raiber wrote: >>>> On 30.10.2019 13:23 fdmanana@xxxxxxxxxx wrote: >>>>> From: Filipe Manana <fdmanana@xxxxxxxx> >>>>> >>>>> Backreference walking, which is used by send to figure if it can issue >>>>> clone operations instead of write operations, can be very slow and use too >>>>> much memory when extents have many references. This change simply skips >>>>> backreference walking when an extent has more than 64 references, in which >>>>> case we fallback to a write operation instead of a clone operation. This >>>>> limit is conservative and in practice I observed no signicant slowdown >>>>> with up to 100 references and still low memory usage up to that limit. >>>>> >>>>> This is a temporary workaround until there are speedups in the backref >>>>> walking code, and as such it does not attempt to add extra interfaces or >>>>> knobs to tweak the threshold. >>>> Thanks for the patch! >>>> >>>> Did some further deliberation on the problem, and for me the best short >>>> term solution (apart from your patch) for the send clone source >>>> detection would be to offload it to userspace. >>>> I.e. a flag like "--no-data"/BTRFS_SEND_FLAG_NO_FILE_DATA that disables >>>> clone source detection. >>>> Userspace can then take each BTRFS_SEND_C_UPDATE_EXTENT and look if it >>>> is in the clone sources/on the send target. If it is a dedup tool it can >>>> use its dedup database, it can use a cache or it can use >>>> BTRFS_IOC_LOGICAL_INO. Another advantage is that user space can do this >>>> in multiple threads. >>>> >>>> Only problem I can think of is that send prevents subvol removal and >>>> dedup during send, but user space may not be finished with the actual >>>> send stream once send has finished outputting the metadata. So there may >>>> be some issues there, but not if one has control over when removal/dedup >>>> happens. >>> I was still having performance issues even with this patch. Shouldn't be >>> too hard to reproduce. E.g., I have a ~10GB sqlite db with 1236459 >>> extents (as dbs tend to have on btrfs). Sending that (even with only two >>> snapshots) causes it to be cpu limited. >>> >>> So I went ahead and continued to use the patch that disables backref >>> walking in send (for the db above the backref walking doesn't do >>> anything anyway), but use it in combination with "--no-data" to read >>> file data and then dedup in user space (only whole file dedup for now). >>> This is all a bit application specific. Is anyone interested in the code? >> You should only see performance improvement if you have extents that >> have more than 64 references. >> You mention two snapshots, but you don't mention if there are extents >> that are shared through reflinks (cloning and deduplication). >> Also, is compression enabled? That can have a very significant impact, >> specially for zstd. >> >> Thanks. > Yeah, sure. It's a database file. I can't see how there would be > reflinks in it (initially during creation, but afterwards not). I back > it up by creating a read-only snapshot and sending/receiving that to a > separate backup volume. That snapshot and the last snapshot for > incremental send are the only ones. > > There is another volume with a lot of files that are reflinked on the > same machine and those get send/received as well, and it should work for > both volumes. > > Compression on the database file is enabled (lzo). I would guess that > zstd should have the fastest decompression, though (lzo seems to have a > little bit better compression speed, but not decompression speed)? The > sqlite page size is 4096 bytes. That might be too low and cause too much > fragmentation... Created a script to reproduce this (albeit with a smaller 1.7GiB file). Test results: https://gist.github.com/uroni/5b601877ce146bb58657b4821ea15fca#file-btrfs_send_test_results The script: https://gist.github.com/uroni/4671b3b5215e230babcd7561c6619a82#file-btrfs_send_test-sh Send, lzo compression, no fragmentation: 0m5.025s Send, lzo compression, 114231 extents: 6m58.613s Send, no compression, no fragmentation: 0m2.741s Send, no compression, 103676 extents: 3m0.290s Multiply those numbers by 10x and a send of 17GiB of data takes over an hour. And yeah, having no compression seems to double the speed. > >> >>> Regards, >>> Martin Raiber >>> >>>>> Reported-by: Atemu <atemu.main@xxxxxxxxx> >>>>> Link: https://lore.kernel.org/linux-btrfs/CAE4GHgkvqVADtS4AzcQJxo0Q1jKQgKaW3JGp3SGdoinVo=C9eQ@xxxxxxxxxxxxxx/T/#me55dc0987f9cc2acaa54372ce0492c65782be3fa >>>>> Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx> >>>>> --- >>>>> fs/btrfs/send.c | 25 ++++++++++++++++++++++++- >>>>> 1 file changed, 24 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c >>>>> index 123ac54af071..518ec1265a0c 100644 >>>>> --- a/fs/btrfs/send.c >>>>> +++ b/fs/btrfs/send.c >>>>> @@ -25,6 +25,14 @@ >>>>> #include "compression.h" >>>>> >>>>> /* >>>>> + * Maximum number of references an extent can have in order for us to attempt to >>>>> + * issue clone operations instead of write operations. This currently exists to >>>>> + * avoid hitting limitations of the backreference walking code (taking a lot of >>>>> + * time and using too much memory for extents with large number of references). >>>>> + */ >>>>> +#define SEND_MAX_EXTENT_REFS 64 >>>>> + >>>>> +/* >>>>> * A fs_path is a helper to dynamically build path names with unknown size. >>>>> * It reallocates the internal buffer on demand. >>>>> * It allows fast adding of path elements on the right side (normal path) and >>>>> @@ -1302,6 +1310,7 @@ static int find_extent_clone(struct send_ctx *sctx, >>>>> struct clone_root *cur_clone_root; >>>>> struct btrfs_key found_key; >>>>> struct btrfs_path *tmp_path; >>>>> + struct btrfs_extent_item *ei; >>>>> int compressed; >>>>> u32 i; >>>>> >>>>> @@ -1349,7 +1358,6 @@ static int find_extent_clone(struct send_ctx *sctx, >>>>> ret = extent_from_logical(fs_info, disk_byte, tmp_path, >>>>> &found_key, &flags); >>>>> up_read(&fs_info->commit_root_sem); >>>>> - btrfs_release_path(tmp_path); >>>>> >>>>> if (ret < 0) >>>>> goto out; >>>>> @@ -1358,6 +1366,21 @@ static int find_extent_clone(struct send_ctx *sctx, >>>>> goto out; >>>>> } >>>>> >>>>> + ei = btrfs_item_ptr(tmp_path->nodes[0], tmp_path->slots[0], >>>>> + struct btrfs_extent_item); >>>>> + /* >>>>> + * Backreference walking (iterate_extent_inodes() below) is currently >>>>> + * too expensive when an extent has a large number of references, both >>>>> + * in time spent and used memory. So for now just fallback to write >>>>> + * operations instead of clone operations when an extent has more than >>>>> + * a certain amount of references. >>>>> + */ >>>>> + if (btrfs_extent_refs(tmp_path->nodes[0], ei) > SEND_MAX_EXTENT_REFS) { >>>>> + ret = -ENOENT; >>>>> + goto out; >>>>> + } >>>>> + btrfs_release_path(tmp_path); >>>>> + >>>>> /* >>>>> * Setup the clone roots. >>>>> */
