Re: [PATCH] btrfs: Fix and enhance merge_extent_mapping() to insert best fitted extent map

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 9, 2014 at 1:28 AM, Qu Wenruo <quwenruo@xxxxxxxxxxxxxx> wrote:
>
> -------- Original Message --------
> Subject: Re: [PATCH] btrfs: Fix and enhance merge_extent_mapping() to insert
> best fitted extent map
> From: Filipe David Manana <fdmanana@xxxxxxxxx>
> To: Qu Wenruo <quwenruo@xxxxxxxxxxxxxx>
> Date: 2014年10月08日 20:08
>>
>> On Fri, Sep 19, 2014 at 1:31 AM, Qu Wenruo <quwenruo@xxxxxxxxxxxxxx>
>> wrote:
>>>
>>> -------- Original Message --------
>>> Subject: Re: [PATCH] btrfs: Fix and enhance merge_extent_mapping() to
>>> insert
>>> best fitted extent map
>>> From: Filipe David Manana <fdmanana@xxxxxxxxx>
>>> To: Qu Wenruo <quwenruo@xxxxxxxxxxxxxx>
>>> Date: 2014年09月18日 21:16
>>>>
>>>> On Wed, Sep 17, 2014 at 4:53 AM, Qu Wenruo <quwenruo@xxxxxxxxxxxxxx>
>>>> wrote:
>>>>>
>>>>> The following commit enhanced the merge_extent_mapping() to reduce
>>>>> fragment in extent map tree, but it can't handle case which existing
>>>>> lies before map_start:
>>>>> 51f39 btrfs: Use right extent length when inserting overlap extent map.
>>>>>
>>>>> [BUG]
>>>>> When existing extent map's start is before map_start,
>>>>> the em->len will be minus, which will corrupt the extent map and fail
>>>>> to
>>>>> insert the new extent map.
>>>>> This will happen when someone get a large extent map, but when it is
>>>>> going to insert it into extent map tree, some one has already commit
>>>>> some write and split the huge extent into small parts.
>>>>
>>>> This sounds like very deterministic to me.
>>>> Any reason to not add tests to the sanity tests that exercise
>>>> this/these case/cases?
>>>
>>> Yes, thanks for the informing.
>>> Will add the test case for it soon.
>>
>> Hi Qu,
>>
>> Any progress on the test?
>>
>> This is a very important one IMHO, not only because of the bad
>> consequences of the bug (extent map corruption, leading to all sorts
>> of chaos), but also because this problem was not found by the full
>> xfstests suite on several developer machines.
>>
>> thanks
>
> Still trying to reproduce it under xfstest framework.

That's the problem, wasn't apparently reproducible (or detectable at
least) by anyone with xfstests.

> But even followiiing the FileBench randomrw behavior(1 thread random read 1
> thread random write on preallocated space),
> I still failed to reproduce it.
>
> Still investigating how to reproduce it.
> Worst case may be add a new C program into src of xfstests?

How about the sanity tests (fs/btrfs/tests/*.c)? Create an empty map
tree, add some extent maps, then try to merge some new extent maps
that used to fail before this fix. Seems simple, no?

thanks Qu


>
> Thanks,
> Qu
>
>>
>>> Thanks,
>>> Qu
>>>
>>>> Thanks
>>>>
>>>>> [REPRODUCER]
>>>>> It is very easy to tiger using filebench with randomrw personality.
>>>>> It is about 100% to reproduce when using 8G preallocated file in 60s
>>>>> randonrw test.
>>>>>
>>>>> [FIX]
>>>>> This patch can now handle any existing extent position.
>>>>> Since it does not directly use existing->start, now it will find the
>>>>> previous and next extent around map_start.
>>>>> So the old existing->start < map_start bug will never happen again.
>>>>>
>>>>> [ENHANCE]
>>>>> This patch will insert the best fitted extent map into extent map tree,
>>>>> other than the oldest [map_start, map_start + sectorsize) or the
>>>>> relatively newer but not perfect [map_start, existing->start).
>>>>>
>>>>> The patch will first search existing extent that does not intersects
>>>>> with
>>>>> the desired map range [map_start, map_start + len).
>>>>> The existing extent will be either before or behind map_start, and
>>>>> based
>>>>> on the existing extent, we can find out the previous and next extent
>>>>> around map_start.
>>>>>
>>>>> So the best fitted extent would be [prev->end, next->start).
>>>>> For prev or next is not found, em->start would be prev->end and em->end
>>>>> wold be next->start.
>>>>>
>>>>> With this patch, the fragment in extent map tree should be reduced much
>>>>> more than the 51f39 commit and reduce an unneeded extent map tree
>>>>> search.
>>>>>
>>>>> Reported-by: Tsutomu Itoh <t-itoh@xxxxxxxxxxxxxx>
>>>>> Signed-off-by: Qu Wenruo <quwenruo@xxxxxxxxxxxxxx>
>>>>> ---
>>>>>    fs/btrfs/inode.c | 79
>>>>> ++++++++++++++++++++++++++++++++++++++++----------------
>>>>>    1 file changed, 57 insertions(+), 22 deletions(-)
>>>>>
>>>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>>>> index 016c403..8039021 100644
>>>>> --- a/fs/btrfs/inode.c
>>>>> +++ b/fs/btrfs/inode.c
>>>>> @@ -6191,21 +6191,60 @@ out_fail_inode:
>>>>>           goto out_fail;
>>>>>    }
>>>>>
>>>>> +/* Find next extent map of a given extent map, caller needs to ensure
>>>>> locks */
>>>>> +static struct extent_map *next_extent_map(struct extent_map *em)
>>>>> +{
>>>>> +       struct rb_node *next;
>>>>> +
>>>>> +       next = rb_next(&em->rb_node);
>>>>> +       if (!next)
>>>>> +               return NULL;
>>>>> +       return container_of(next, struct extent_map, rb_node);
>>>>> +}
>>>>> +
>>>>> +static struct extent_map *prev_extent_map(struct extent_map *em)
>>>>> +{
>>>>> +       struct rb_node *prev;
>>>>> +
>>>>> +       prev = rb_prev(&em->rb_node);
>>>>> +       if (!prev)
>>>>> +               return NULL;
>>>>> +       return container_of(prev, struct extent_map, rb_node);
>>>>> +}
>>>>> +
>>>>>    /* helper for btfs_get_extent.  Given an existing extent in the
>>>>> tree,
>>>>> + * the existing extent is the nearest extent to map_start,
>>>>>     * and an extent that you want to insert, deal with overlap and
>>>>> insert
>>>>> - * the new extent into the tree.
>>>>> + * the best fitted new extent into the tree.
>>>>>     */
>>>>>    static int merge_extent_mapping(struct extent_map_tree *em_tree,
>>>>>                                   struct extent_map *existing,
>>>>>                                   struct extent_map *em,
>>>>>                                   u64 map_start)
>>>>>    {
>>>>> +       struct extent_map *prev;
>>>>> +       struct extent_map *next;
>>>>> +       u64 start;
>>>>> +       u64 end;
>>>>>           u64 start_diff;
>>>>>
>>>>>           BUG_ON(map_start < em->start || map_start >=
>>>>> extent_map_end(em));
>>>>> -       start_diff = map_start - em->start;
>>>>> -       em->start = map_start;
>>>>> -       em->len = existing->start - em->start;
>>>>> +
>>>>> +       if (existing->start > map_start) {
>>>>> +               next = existing;
>>>>> +               prev = prev_extent_map(next);
>>>>> +       } else {
>>>>> +               prev = existing;
>>>>> +               next = next_extent_map(prev);
>>>>> +       }
>>>>> +
>>>>> +       start = prev ? extent_map_end(prev) : em->start;
>>>>> +       start = max_t(u64, start, em->start);
>>>>> +       end = next ? next->start : extent_map_end(em);
>>>>> +       end = min_t(u64, end, extent_map_end(em));
>>>>> +       start_diff = start - em->start;
>>>>> +       em->start = start;
>>>>> +       em->len = end - start;
>>>>>           if (em->block_start < EXTENT_MAP_LAST_BYTE &&
>>>>>               !test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) {
>>>>>                   em->block_start += start_diff;
>>>>> @@ -6482,25 +6521,21 @@ insert:
>>>>>
>>>>>                   ret = 0;
>>>>>
>>>>> -               existing = lookup_extent_mapping(em_tree, start, len);
>>>>> -               if (existing && (existing->start > start ||
>>>>> -                   existing->start + existing->len <= start)) {
>>>>> +               existing = search_extent_mapping(em_tree, start, len);
>>>>> +               /*
>>>>> +                * existing will always be non-NULL, since there must
>>>>> be
>>>>> +                * extent causing the -EEXIST.
>>>>> +                */
>>>>> +               if (start >= extent_map_end(existing) ||
>>>>> +                   start + len <= existing->start) {
>>>>> +                       /*
>>>>> +                        * The existing extent map is the one nearest
>>>>> to
>>>>> +                        * the [start, start + len) range which
>>>>> overlaps
>>>>> +                        */
>>>>> +                       err = merge_extent_mapping(em_tree, existing,
>>>>> +                                                  em, start);
>>>>>                           free_extent_map(existing);
>>>>> -                       existing = NULL;
>>>>> -               }
>>>>> -               if (!existing) {
>>>>> -                       existing = lookup_extent_mapping(em_tree,
>>>>> em->start,
>>>>> -                                                        em->len);
>>>>> -                       if (existing) {
>>>>> -                               err = merge_extent_mapping(em_tree,
>>>>> existing,
>>>>> -                                                          em, start);
>>>>> -                               free_extent_map(existing);
>>>>> -                               if (err) {
>>>>> -                                       free_extent_map(em);
>>>>> -                                       em = NULL;
>>>>> -                               }
>>>>> -                       } else {
>>>>> -                               err = -EIO;
>>>>> +                       if (err) {
>>>>>                                   free_extent_map(em);
>>>>>                                   em = NULL;
>>>>>                           }
>>>>> --
>>>>> 2.1.0
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
>>>>> in
>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>>
>>
>>
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux