On 20.01.2020 11:57 Filipe Manana wrote: > On Sat, Jan 18, 2020 at 6:42 PM Martin Raiber <martin@xxxxxxxxxxxx> wrote: >> On 26.11.2019 18:52 Martin Raiber wrote: >>> On 30.10.2019 13:23 fdmanana@xxxxxxxxxx wrote: >>>> From: Filipe Manana <fdmanana@xxxxxxxx> >>>> >>>> Backreference walking, which is used by send to figure if it can issue >>>> clone operations instead of write operations, can be very slow and use too >>>> much memory when extents have many references. This change simply skips >>>> backreference walking when an extent has more than 64 references, in which >>>> case we fallback to a write operation instead of a clone operation. This >>>> limit is conservative and in practice I observed no signicant slowdown >>>> with up to 100 references and still low memory usage up to that limit. >>>> >>>> This is a temporary workaround until there are speedups in the backref >>>> walking code, and as such it does not attempt to add extra interfaces or >>>> knobs to tweak the threshold. >>> Thanks for the patch! >>> >>> Did some further deliberation on the problem, and for me the best short >>> term solution (apart from your patch) for the send clone source >>> detection would be to offload it to userspace. >>> I.e. a flag like "--no-data"/BTRFS_SEND_FLAG_NO_FILE_DATA that disables >>> clone source detection. >>> Userspace can then take each BTRFS_SEND_C_UPDATE_EXTENT and look if it >>> is in the clone sources/on the send target. If it is a dedup tool it can >>> use its dedup database, it can use a cache or it can use >>> BTRFS_IOC_LOGICAL_INO. Another advantage is that user space can do this >>> in multiple threads. >>> >>> Only problem I can think of is that send prevents subvol removal and >>> dedup during send, but user space may not be finished with the actual >>> send stream once send has finished outputting the metadata. So there may >>> be some issues there, but not if one has control over when removal/dedup >>> happens. >> I was still having performance issues even with this patch. Shouldn't be >> too hard to reproduce. E.g., I have a ~10GB sqlite db with 1236459 >> extents (as dbs tend to have on btrfs). Sending that (even with only two >> snapshots) causes it to be cpu limited. >> >> So I went ahead and continued to use the patch that disables backref >> walking in send (for the db above the backref walking doesn't do >> anything anyway), but use it in combination with "--no-data" to read >> file data and then dedup in user space (only whole file dedup for now). >> This is all a bit application specific. Is anyone interested in the code? > You should only see performance improvement if you have extents that > have more than 64 references. > You mention two snapshots, but you don't mention if there are extents > that are shared through reflinks (cloning and deduplication). > Also, is compression enabled? That can have a very significant impact, > specially for zstd. > > Thanks. Yeah, sure. It's a database file. I can't see how there would be reflinks in it (initially during creation, but afterwards not). I back it up by creating a read-only snapshot and sending/receiving that to a separate backup volume. That snapshot and the last snapshot for incremental send are the only ones. There is another volume with a lot of files that are reflinked on the same machine and those get send/received as well, and it should work for both volumes. Compression on the database file is enabled (lzo). I would guess that zstd should have the fastest decompression, though (lzo seems to have a little bit better compression speed, but not decompression speed)? The sqlite page size is 4096 bytes. That might be too low and cause too much fragmentation... > > >> Regards, >> Martin Raiber >> >>>> Reported-by: Atemu <atemu.main@xxxxxxxxx> >>>> Link: https://lore.kernel.org/linux-btrfs/CAE4GHgkvqVADtS4AzcQJxo0Q1jKQgKaW3JGp3SGdoinVo=C9eQ@xxxxxxxxxxxxxx/T/#me55dc0987f9cc2acaa54372ce0492c65782be3fa >>>> Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx> >>>> --- >>>> fs/btrfs/send.c | 25 ++++++++++++++++++++++++- >>>> 1 file changed, 24 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c >>>> index 123ac54af071..518ec1265a0c 100644 >>>> --- a/fs/btrfs/send.c >>>> +++ b/fs/btrfs/send.c >>>> @@ -25,6 +25,14 @@ >>>> #include "compression.h" >>>> >>>> /* >>>> + * Maximum number of references an extent can have in order for us to attempt to >>>> + * issue clone operations instead of write operations. This currently exists to >>>> + * avoid hitting limitations of the backreference walking code (taking a lot of >>>> + * time and using too much memory for extents with large number of references). >>>> + */ >>>> +#define SEND_MAX_EXTENT_REFS 64 >>>> + >>>> +/* >>>> * A fs_path is a helper to dynamically build path names with unknown size. >>>> * It reallocates the internal buffer on demand. >>>> * It allows fast adding of path elements on the right side (normal path) and >>>> @@ -1302,6 +1310,7 @@ static int find_extent_clone(struct send_ctx *sctx, >>>> struct clone_root *cur_clone_root; >>>> struct btrfs_key found_key; >>>> struct btrfs_path *tmp_path; >>>> + struct btrfs_extent_item *ei; >>>> int compressed; >>>> u32 i; >>>> >>>> @@ -1349,7 +1358,6 @@ static int find_extent_clone(struct send_ctx *sctx, >>>> ret = extent_from_logical(fs_info, disk_byte, tmp_path, >>>> &found_key, &flags); >>>> up_read(&fs_info->commit_root_sem); >>>> - btrfs_release_path(tmp_path); >>>> >>>> if (ret < 0) >>>> goto out; >>>> @@ -1358,6 +1366,21 @@ static int find_extent_clone(struct send_ctx *sctx, >>>> goto out; >>>> } >>>> >>>> + ei = btrfs_item_ptr(tmp_path->nodes[0], tmp_path->slots[0], >>>> + struct btrfs_extent_item); >>>> + /* >>>> + * Backreference walking (iterate_extent_inodes() below) is currently >>>> + * too expensive when an extent has a large number of references, both >>>> + * in time spent and used memory. So for now just fallback to write >>>> + * operations instead of clone operations when an extent has more than >>>> + * a certain amount of references. >>>> + */ >>>> + if (btrfs_extent_refs(tmp_path->nodes[0], ei) > SEND_MAX_EXTENT_REFS) { >>>> + ret = -ENOENT; >>>> + goto out; >>>> + } >>>> + btrfs_release_path(tmp_path); >>>> + >>>> /* >>>> * Setup the clone roots. >>>> */ >
