On Fri, Dec 22, 2017 at 02:10:45PM +0200, Nikolay Borisov wrote:
>
>
> On 22.12.2017 00:42, Liu Bo wrote:
> > This fixes a corner case that is caused by a race of dio write vs dio
> > read/write.
> >
> > dio write:
> > [0, 32k) -> [0, 8k) + [8k, 32k)
> >
> > dio read/write:
> >
> > While get_extent() with [0, 4k), [0, 8k) is found as existing em, even
> > though start == existing->start, em is [0, 32k),
> > extent_map_end(em) > extent_map_end(existing),
> > then it goes thru merge_extent_mapping() which tries to add a [8k, 8k)
> > (with a length 0), and get_extent ends up returning -EEXIST, and dio
> > read/write will get -EEXIST which is confusing applications.
>
> I think this paragraph should be expanded a bit since you've crammed a
> lot of information in few words.
>
OK, it's not easy to explain the problem, either.
Probably I should also attach the following as a extra explanation,
+ * Suppose that no extent map has been loaded into memory yet.
+ * There is a file extent [0, 32K), two jobs are running concurrently
+ * against it, t1 is doing dio write to [8K, 32K) and t2 is doing dio
+ * read from [0, 4K) or [4K, 8K).
+ *
+ * t1 goes ahead of t2 and splits em [0, 32K) to em [0K, 8K) and [8K 32K).
+ *
+ * t1 t2
+ * btrfs_get_blocks_direct() btrfs_get_blocks_direct()
+ * -> btrfs_get_extent() -> btrfs_get_extent()
+ * -> lookup_extent_mapping()
+ * -> add_extent_mapping() -> lookup_extent_mapping()
+ * # load [0, 32K)
+ * -> btrfs_new_extent_direct()
+ * -> btrfs_drop_extent_cache()
+ * # split [0, 32K)
+ * -> add_extent_mapping()
+ * # add [8K, 32K)
+ * -> add_extent_mapping()
+ * # handle -EEXIST when adding
+ * # [0, 32K)
> In the below graphs em should be extent we'd like to insert, no?
> So given that it always encompasses the existing (0, 8k given the
> above description) then em should be 0, 32k ?
Yes and yes.
thanks,
-liubo
> >
> > Here I concluded all the possible situations,
> > 1) start < existing->start
> >
> > +-----------+em+-----------+
> > +--prev---+ | +-------------+ |
> > | | | | | |
> > +---------+ + +---+existing++ ++
> > +
> > |
> > +
> > start
> >
> > 2) start == existing->start
> >
> > +------------em------------+
> > | +-------------+ |
> > | | | |
> > + +----existing-+ +
> > |
> > |
> > +
> > start
> >
> > 3) start > existing->start && start < (existing->start + existing->len)
> >
> > +------------em------------+
> > | +-------------+ |
> > | | | |
> > + +----existing-+ +
> > |
> > |
> > +
> > start
> >
> > 4) start >= (existing->start + existing->len)
> >
> > +-----------+em+-----------+
> > | +-------------+ | +--next---+
> > | | | | | |
> > + +---+existing++ + +---------+
> > +
> > |
> > +
> > start
> >
> > After going thru the above case by case, it turns out that if start is
> > within existing em (front inclusive), then the existing em should be
> > returned, otherwise, we try our best to merge candidate em with
> > sibling ems to form a larger em.
> >
> > Reported-by: David Vallender <david.vallender@xxxxxxxxxxxxxx>
> > Signed-off-by: Liu Bo <bo.li.liu@xxxxxxxxxx>
> > ---
> > fs/btrfs/extent_map.c | 25 ++++++++++---------------
> > 1 file changed, 10 insertions(+), 15 deletions(-)
> >
> > diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
> > index 6653b08..d386cfb 100644
> > --- a/fs/btrfs/extent_map.c
> > +++ b/fs/btrfs/extent_map.c
> > @@ -483,7 +483,7 @@ static struct extent_map *prev_extent_map(struct extent_map *em)
> > static int merge_extent_mapping(struct extent_map_tree *em_tree,
> > struct extent_map *existing,
> > struct extent_map *em,
> > - u64 map_start)
> > + u64 map_start, u64 map_len)
> > {
> > struct extent_map *prev;
> > struct extent_map *next;
> > @@ -496,9 +496,13 @@ static int merge_extent_mapping(struct extent_map_tree *em_tree,
> > if (existing->start > map_start) {
> > next = existing;
> > prev = prev_extent_map(next);
> > + if (prev)
> > + ASSERT(extent_map_end(prev) <= map_start);
> > } else {
> > prev = existing;
> > next = next_extent_map(prev);
> > + if (next)
> > + ASSERT(map_start + map_len <= next->start);
> > }
> >
> > start = prev ? extent_map_end(prev) : em->start;
> > @@ -540,35 +544,26 @@ int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
> > * existing will always be non-NULL, since there must be
> > * extent causing the -EEXIST.
> > */
> > - if (existing->start == em->start &&
> > - extent_map_end(existing) >= extent_map_end(em) &&
> > - em->block_start == existing->block_start) {
> > - /*
> > - * The existing extent map already encompasses the
> > - * entire extent map we tried to add.
> > - */
> > + if (start >= existing->start &&
> > + start < extent_map_end(existing)) {
> > free_extent_map(em);
> > *em_in = existing;
> > ret = 0;
> > - } else if (start >= extent_map_end(existing) ||
> > - start <= existing->start) {
> > + } else {
> > /*
> > * The existing extent map is the one nearest to
> > * the [start, start + len) range which overlaps
> > */
> > ret = merge_extent_mapping(em_tree, existing,
> > - em, start);
> > + em, start, len);
> > free_extent_map(existing);
> > if (ret) {
> > free_extent_map(em);
> > *em_in = NULL;
> > }
> > - } else {
> > - free_extent_map(em);
> > - *em_in = existing;
> > - ret = 0;
> > }
> > }
> > +
> > ASSERT(ret == 0 || ret == -EEXIST);
> > return ret;
> > }
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html