Re: [PATCH] Btrfs: send, skip backreference walking for extents with many references

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 20.01.2020 12:30 Martin Raiber wrote:
> On 20.01.2020 11:57 Filipe Manana wrote:
>> On Sat, Jan 18, 2020 at 6:42 PM Martin Raiber <martin@xxxxxxxxxxxx> wrote:
>>> On 26.11.2019 18:52 Martin Raiber wrote:
>>>> On 30.10.2019 13:23 fdmanana@xxxxxxxxxx wrote:
>>>>> From: Filipe Manana <fdmanana@xxxxxxxx>
>>>>>
>>>>> Backreference walking, which is used by send to figure if it can issue
>>>>> clone operations instead of write operations, can be very slow and use too
>>>>> much memory when extents have many references. This change simply skips
>>>>> backreference walking when an extent has more than 64 references, in which
>>>>> case we fallback to a write operation instead of a clone operation. This
>>>>> limit is conservative and in practice I observed no signicant slowdown
>>>>> with up to 100 references and still low memory usage up to that limit.
>>>>>
>>>>> This is a temporary workaround until there are speedups in the backref
>>>>> walking code, and as such it does not attempt to add extra interfaces or
>>>>> knobs to tweak the threshold.
>>>> Thanks for the patch!
>>>>
>>>> Did some further deliberation on  the problem, and for me the best short
>>>> term solution (apart from your patch) for the send clone source
>>>> detection would be to offload it to userspace.
>>>> I.e. a flag like "--no-data"/BTRFS_SEND_FLAG_NO_FILE_DATA that disables
>>>> clone source detection.
>>>> Userspace can then take each BTRFS_SEND_C_UPDATE_EXTENT and look if it
>>>> is in the clone sources/on the send target. If it is a dedup tool it can
>>>> use its dedup database, it can use a cache or it can use
>>>> BTRFS_IOC_LOGICAL_INO. Another advantage is that user space can do this
>>>> in multiple threads.
>>>>
>>>> Only problem I can think of is that send prevents subvol removal and
>>>> dedup during send, but user space may not be finished with the actual
>>>> send stream once send has finished outputting the metadata. So there may
>>>> be some issues there, but not if one has control over when removal/dedup
>>>> happens.
>>> I was still having performance issues even with this patch. Shouldn't be
>>> too hard to reproduce. E.g., I have a ~10GB sqlite db with 1236459
>>> extents (as dbs tend to have on btrfs). Sending that (even with only two
>>> snapshots) causes it to be cpu limited.
>>>
>>> So I went ahead and continued to use the patch that disables backref
>>> walking in send (for the db above the backref walking doesn't do
>>> anything anyway), but use it in combination with "--no-data" to read
>>> file data and then dedup in user space (only whole file dedup for now).
>>> This is all a bit application specific. Is anyone interested in the code?
>> You should only see performance improvement if you have extents that
>> have more than 64 references.
>> You mention two snapshots, but you don't mention if there are extents
>> that are shared through reflinks (cloning and deduplication).
>> Also, is compression enabled? That can have a very significant impact,
>> specially for zstd.
>>
>> Thanks.
> Yeah, sure. It's a database file. I can't see how there would be
> reflinks in it (initially during creation, but afterwards not). I back
> it up by creating a read-only snapshot and sending/receiving that to a
> separate backup volume. That snapshot and the last snapshot for
> incremental send are the only ones.
>
> There is another volume with a lot of files that are reflinked on the
> same machine and those get send/received as well, and it should work for
> both volumes.
>
> Compression on the database file is enabled (lzo). I would guess that
> zstd should have the fastest decompression, though (lzo seems to have a
> little bit better compression speed, but not decompression speed)? The
> sqlite page size is 4096 bytes. That might be too low and cause too much
> fragmentation...

Created a script to reproduce this (albeit with a smaller 1.7GiB file).

Test results:
https://gist.github.com/uroni/5b601877ce146bb58657b4821ea15fca#file-btrfs_send_test_results

The script:
https://gist.github.com/uroni/4671b3b5215e230babcd7561c6619a82#file-btrfs_send_test-sh

Send, lzo compression, no fragmentation: 0m5.025s
Send, lzo compression, 114231 extents: 6m58.613s

Send, no compression, no fragmentation: 0m2.741s
Send, no compression, 103676 extents: 3m0.290s

Multiply those numbers by 10x and a send of 17GiB of data takes over an
hour. And yeah, having no compression seems to double the speed.

>
>>
>>> Regards,
>>> Martin Raiber
>>>
>>>>> Reported-by: Atemu <atemu.main@xxxxxxxxx>
>>>>> Link: https://lore.kernel.org/linux-btrfs/CAE4GHgkvqVADtS4AzcQJxo0Q1jKQgKaW3JGp3SGdoinVo=C9eQ@xxxxxxxxxxxxxx/T/#me55dc0987f9cc2acaa54372ce0492c65782be3fa
>>>>> Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>
>>>>> ---
>>>>>  fs/btrfs/send.c | 25 ++++++++++++++++++++++++-
>>>>>  1 file changed, 24 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
>>>>> index 123ac54af071..518ec1265a0c 100644
>>>>> --- a/fs/btrfs/send.c
>>>>> +++ b/fs/btrfs/send.c
>>>>> @@ -25,6 +25,14 @@
>>>>>  #include "compression.h"
>>>>>
>>>>>  /*
>>>>> + * Maximum number of references an extent can have in order for us to attempt to
>>>>> + * issue clone operations instead of write operations. This currently exists to
>>>>> + * avoid hitting limitations of the backreference walking code (taking a lot of
>>>>> + * time and using too much memory for extents with large number of references).
>>>>> + */
>>>>> +#define SEND_MAX_EXTENT_REFS        64
>>>>> +
>>>>> +/*
>>>>>   * A fs_path is a helper to dynamically build path names with unknown size.
>>>>>   * It reallocates the internal buffer on demand.
>>>>>   * It allows fast adding of path elements on the right side (normal path) and
>>>>> @@ -1302,6 +1310,7 @@ static int find_extent_clone(struct send_ctx *sctx,
>>>>>      struct clone_root *cur_clone_root;
>>>>>      struct btrfs_key found_key;
>>>>>      struct btrfs_path *tmp_path;
>>>>> +    struct btrfs_extent_item *ei;
>>>>>      int compressed;
>>>>>      u32 i;
>>>>>
>>>>> @@ -1349,7 +1358,6 @@ static int find_extent_clone(struct send_ctx *sctx,
>>>>>      ret = extent_from_logical(fs_info, disk_byte, tmp_path,
>>>>>                                &found_key, &flags);
>>>>>      up_read(&fs_info->commit_root_sem);
>>>>> -    btrfs_release_path(tmp_path);
>>>>>
>>>>>      if (ret < 0)
>>>>>              goto out;
>>>>> @@ -1358,6 +1366,21 @@ static int find_extent_clone(struct send_ctx *sctx,
>>>>>              goto out;
>>>>>      }
>>>>>
>>>>> +    ei = btrfs_item_ptr(tmp_path->nodes[0], tmp_path->slots[0],
>>>>> +                        struct btrfs_extent_item);
>>>>> +    /*
>>>>> +     * Backreference walking (iterate_extent_inodes() below) is currently
>>>>> +     * too expensive when an extent has a large number of references, both
>>>>> +     * in time spent and used memory. So for now just fallback to write
>>>>> +     * operations instead of clone operations when an extent has more than
>>>>> +     * a certain amount of references.
>>>>> +     */
>>>>> +    if (btrfs_extent_refs(tmp_path->nodes[0], ei) > SEND_MAX_EXTENT_REFS) {
>>>>> +            ret = -ENOENT;
>>>>> +            goto out;
>>>>> +    }
>>>>> +    btrfs_release_path(tmp_path);
>>>>> +
>>>>>      /*
>>>>>       * Setup the clone roots.
>>>>>       */





[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux