Re: [PATCH] Btrfs: fix leaf corruption after __btrfs_drop_extents

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jun 8, 2014 at 12:11 PM, Filipe David Manana <fdmanana@xxxxxxxxx> wrote:
> On Sun, Jun 8, 2014 at 11:54 AM, Liu Bo <bo.li.liu@xxxxxxxxxx> wrote:
>> On Fri, Jun 06, 2014 at 01:11:17PM +0100, Filipe David Manana wrote:
>>> On Fri, Jun 6, 2014 at 7:25 AM, Liu Bo <bo.li.liu@xxxxxxxxxx> wrote:
>>> > Several reports about leaf corruption has been floating on the list, one of them
>>> > points to __btrfs_drop_extents(), and we find that the leaf becomes corrupted
>>> > after __btrfs_drop_extents(), it's really a rare case but it does exist.
>>> >
>>> > The problem turns out to be btrfs_next_leaf() called in __btrfs_drop_extents().
>>> >
>>> > So in btrfs_next_leaf(), we release the current path to re-search the last key of
>>> > the leaf for locating next leaf, and we've taken it into account that there might
>>> > be balance operations between leafs during this 'unlock and re-lock' dance, so
>>> > we check the path again and advance it if there are now more items available.
>>> > But things are a bit different if that last key happens to be removed and balance
>>> > gets a bigger key as the last one, and btrfs_search_slot will return it with
>>> > ret > 0,
>>>
>>> Hi Liu,
>>>
>>> That makes a lot of sense outside the context of btrfs_drop_extents().
>>>
>>> If I understand you correctly, you're saying that we have a file
>>> extent item that gets deleted after we release the path in
>>> btrfs_next_leaf and before btrfs_search_slot finishes for doing a
>>> lookup for that item.
>>
>> Hi Filipe,
>>
>> Not just "finishes", even before btrfs_search_slot "starts".
>
> Yes :)
>
>>
>>>
>>> But who calls btrfs_drop_extents(), is supposed to have locked the
>>> file range in the inode's io tree, right? Isn't the goal of locking
>>> ranges in the io tree to serialize manipulation of file extent items
>>> within a certain range? Unless there's some caller of _drop_extents()
>>> that isn't locking the range in the io tree, I'm not seeing how we can
>>> get the file extent item deleted or updated by another task.
>>
>> Oh, good question!  I must admit I forget about that locking, just check the log,
>> so here is the answer -- the locking range and the extent range is not
>> consistent.
>>
>> In __btrfs_drop_extents(), there is such a case,
>>
>>                 /*
>>                  *       | ---- range to drop ----- |
>>                  *  | -------- extent -------- |
>>                  */
>>                 if (start > key.offset && end >= extent_end) {
>>
>> we do actually lock the range, but in this case, the extent is not included in
>> our range, so this give a chance for others to hack in to do 'confusing-us' things.
>
> Hum, yes. I thought about the cases where others split our extent
> item, but that shouldn't remove the key.
>
> There is one case (I thought about after)  where it's obvious the
> issue you found in btrfs_next_leaf can affect btrfs_drop_extents: we
> have NO_HOLES set, our first tree seach returns us an extent that ends
> before our target range's start (therefore outside the range we locked
> in the io tree) and that extent is the last item in a leaf. And then
> the problem you described happens between the unlock and re-lock.
>
>>
>> I can add these to the commit log if you want, others may also feel confused.
>
> It is fine for me as it is.
>
> Thanks Liu
>
>>
>> thanks,
>> -liubo
>>
>>>
>>> Thanks
>>>
>>> > IOW, nothing change in this leaf except the new last key, then we think
>>> > we're okay because there is no more item balanced in, fine, we thinks we can
>>> > go to the next leaf.
>>> >
>>> > However, we should return that bigger key, otherwise we deserve leaf corruption,
>>> > for example, in endio, skipping that key means that __btrfs_drop_extents() thinks
>>> > it has dropped all extent matched the required range and finish_ordered_io can
>>> > safely insert a new extent, but it actually doesn't and ends up a leaf
>>> > corruption.
>>> >
>>> > This takes the special case into account.
>>> >
>>> > Signed-off-by: Liu Bo <bo.li.liu@xxxxxxxxxx>

Reviewed-by: Filipe Manana <fdmanana@xxxxxxxxx>

Can affect other users of btrfs_next_leaf too that aren't necessarily
modifying anything in tree (like _drop_extents), makes readers miss
items.

>>> > ---
>>> >  fs/btrfs/ctree.c | 18 ++++++++++++++++++
>>> >  1 file changed, 18 insertions(+)
>>> >
>>> > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>>> > index 1bcfcdb..bbb256c 100644
>>> > --- a/fs/btrfs/ctree.c
>>> > +++ b/fs/btrfs/ctree.c
>>> > @@ -5736,6 +5736,24 @@ again:
>>> >                 ret = 0;
>>> >                 goto done;
>>> >         }
>>> > +       /*
>>> > +        * So the above check misses one case:
>>> > +        * - after releasing the path above, someone has removed the item that
>>> > +        *   used to be at the very end of the block, and balance between leafs
>>> > +        *   gets another one with bigger key.offset to replace it.
>>> > +        *
>>> > +        * This one should be returned as well, or we can get leaf corruption
>>> > +        * later(esp. in __btrfs_drop_extents()).
>>> > +        *
>>> > +        * And a bit more explanation about this check,
>>> > +        * with ret > 0, the key isn't found, the path points to the slot
>>> > +        * where it should be inserted, so the path->slots[0] item must be the
>>> > +        * bigger one.
>>> > +        */
>>> > +       if (nritems > 0 && ret > 0 && path->slots[0] == nritems - 1) {
>>> > +               ret = 0;
>>> > +               goto done;
>>> > +       }
>>> >
>>> >         while (level < BTRFS_MAX_LEVEL) {
>>> >                 if (!path->nodes[level]) {
>>> > --
>>> > 1.8.1.4
>>> >
>>> > --
>>> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> > the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>> --
>>> Filipe David Manana,
>>>
>>> "Reasonable men adapt themselves to the world.
>>>  Unreasonable men adapt the world to themselves.
>>>  That's why all progress depends on unreasonable men."
>
>
>
> --
> Filipe David Manana,
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux